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[57] 



ABSTRACT 



A multimedia collaboration system that integrates separate 
real-time and asynchronous networks — the former for real- 
time audio and video, and the latter for control signals and 
textual, graphical and other data — in a manner that is 
interoperable across different computer and network oper- 
ating system platforms and which closely approximates the 
experience of face-to-face collaboration, while liberating the 
participants from the limitations of time and distance. These 
capabilities are achieved by exploiting a variety of hardware, 
software and networking technologies in a manner that 
preserves the quality and integrity of audio/video/data and 
other multimedia information, even after wide area 
transmission, and at a significantly reduced networking cost 
as compared to what would be required by presently known 
approaches. The system architecture is readily scalable to 
the largest enterprise network environments. It accommo- 
dates differing levels of collaborative capabilities available 
to individual users and permits high-quality audio and video 
capabilities to be readily superimposed onto existing per- 
sonal computers and workstations and their interconnecting 
LANs and WANs. In a particular preferred embodiment, a 
plurality of geographically dispersed multimedia LANs are 
interconnected by a WAN. The demands made on the WAN 
are significantly reduced by employing multi-hopping 
techniques, including dynamically avoiding the unnecessary 
decompression of data at intermediate hops, and exploiting 
video mosaicing, cut-and-paste and audio mixing technolo- 
gies so that significantly fewer wide area transmission paths 
are required while maintaining the high quality of the 
transmitted audio/video. 

43 Claims, 34 Drawing Sheets 
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MULTIMEDIA COLLABORATION SYSTEM rily because of their lack of high-quality video (which is 

ARRANGEMENT FOR ROUTING necessary for capturing the visual cues discussed above) and 

COMPRESSED AV SIGNAL THROUGH A their limited data sharing capabilities. Similarly, telephone 

PARTICIPANT SITE WITHOUT answering machines, voice mail, fax machines and conven- 

DECOMPRESSING THE AV SIGNAL 5 tional electronic mail systems provide incomplete solutions 

to the problems presented by deferred (asynchronous) col- 

BACKGROUND OF THE INVENTION laboration because they are totally incapable of communi- 

The present invention relays to computer-based systems ^ ^ c . ues > g estures ' etc ' and > ^ conventional 
for enhancing collaboration between and among individuals vi^nferencmg systems, are generally limited in the 
who are separated by distance and/or time (referred to herein 10 nchness of the data that can be exchanged 
as "distributed collaboration"). Principal among the inven- a has Deen proposed to extend traditional videoconf er- 
tion's goals is to replicate in a desktop environment, to the endn S capabilities from conference centers, where groups 
maximum extent possible, the full range, level and intensity of participants must assemble in the same room, to the 
of interpersonal communication and information sharing desktop, where individual participants may remain in their 
which would occur if all the participants were together in the 15 office or home. Such a system is disclosed in U.S. Pat No. 
same room at the same time (referred to herein as "face-to- 4,710,917 to Tompkins et al for Video Conferencing Net- 
face collaboration M ) work issued on Dec. 1, 1987. ft has also been proposed to 

It is well known to behavioral scientists that interpersonal ">g nent ^erencing systen* with limited 

communication involves a large number of subtle and com- , n ^ ™* J*^™- However, such dedicated videocon- 

, , f u n M M * ^« A 20 ferencmg systems (and extensions thereof) do not effectively 

plex visual cues, referred to by names like "eye contact and , M V • * • . ^ ^ a* a • * *- 

tt* * * „ . . , . , . j... ! . * leverage the investment in existing embedded information 

Ijody language, which provide additional information over - ^ z. ^. l ^ t_* i * j 

. v ° .tT t_ . . . ™ infrastructures — such as desktop personal computers and 

and above the spoken words and explicit gestures. These , ^ ^ . , ^_ y t AX _ j 

» j - workstations, local area network (LAN) and wide area 

cues are, for me most part, processed subconsciously by the ^ . miA ^rT • 7 7 • ■ , " 

■ rtf „ „ a Ji„„ network (WAN) environments, building wiring, etc. — to 

participants, and often control the course of a meeting. „ . A \ A ■ ' , . ' , . . . & * 

T , . , , , . 25 facilitate interactive sharing of data in the form of text 

In addition to spoken words, demonstrative gestures and ^ charts , 

recorded video, screen displays and 

behavioral cues, collaboration often involves the sharing of ^ ^ ^ is> & ^ t to add c^u^g ^1^^ 

visual infonnation-e.g , printed material such as drawings to a videoconferencing system, rather than adding multime- 

photographs, charts and graphs, as well as videotapes and ^ ^ coIlaborative capacities to the user's existing 

computer-based animations, visualizations and other M computer system. ITjus, while such systems may be useful 

displays—in such a way that the participants can collec- ^ limited contextSi ^ do not Wde ae ^^1^ 
lively and interacfavely examine discuss annotate and ^ for ^^j^y effective coUaboration, and are not 

revise the information. This combination of spoken words, , cost-effective 

gestures, visual cues and interactive data sharing signifi- - i j-j-j j 

6 rf * . ~ . £ „ . * *=\ Conversely, audio and video capture and processing capa- 

cantly enhances the effectiveness of collaboration u a » U . 1V . . «, . j 

. i , . ^ , t « . _^ - «. 35 bihties have recently been integrated into desktop and 

variety of contexts, such as Wstornung and problem bfc ^ ^ and workstations (hereinafter 

solvmg" sessions among professional^ a particular field, " Jterka ^ icfeaed to £ ^rf^^y These capabilities 

consultations between one or more experts and one or more ^ m ^ ^j^^^g 

chente, sensmve business or pohbcal ^negotiations and Ae ^ for ^ing CD-ROM-based works. While such 

like. In chstributed collaboration settings, then, where the ^ , , ? . ... , 

^ . , At _ . , .1 * " " 40 systems are capable of processing, combuung, and recording 

participants cannot be in the same place at the same time, the jj* , « /. * j \ *l j 

J*. , „ . _ t _ y « - ^ .« - , audio, video and data locally (lc, at the desktop), they do 
beneficial effects of face-to-face coUaboration wdl be red- ; deq uately support networked collaboradve 

ized only to the extent that each of the remotely located envin)mn 2 ntS) ^ cip ^ y due to me subs^tial bandwidth 

participants can be 'recreated" at each site. requirements real-time transmission of high-quaHty, 

To illustrate the difficulties inherent in reproducing the 45 digitized audio and full-motion video which preclude con- 
beneficial effects of face-to-face collaboration in a distrib- yentional LANs from supporting more than a few worksta- 
uted collaboration environment, consider the case of tf ons TnuSi although currently available desktop multime- 
decision-making in the fast-moving commodities trading ^ computers frequently include videoconferencing and 
markets, where many thousands of dollars of profit (or loss) other multimedia or collaborative capabilities within their 
may depend on an expert trader making the right decision 50 advertised feature set (see, e.g M A. Reinhardt, "Video Con- 
within hours, or even minutes, of receiving a request from a quers ^ e Desktop," BYTE, September 1993, pp. 64-90), 
distant client The expert requires immediate access to a such systC ms have not yet solved the many problems inner- 
wide range of ^potentially relevant informalion such as cnt ^ my practical implementation of a scalable collabora- 
finaneial data, historical pricing information, current price ^ 0Q SYStenL 
quotes, newswire services, government policies and 55 

programs, economic forecasts, weather reports, etc. Much of SUMMARY OF THE INVENTION 
this information can be processed by the expert in isolation- In accordance with the present invention, computer 
However, before making a decision to buy or sell, he or she hardware, software and communications technologies are 
will frequently need to discuss the information with other combined in novel ways to produce a multimedia coUabo- 
experts, who may be geographically dispersed, and with the eo ration system that greatly facilitates distributed 
client. One or more of these other experts may be in a collaboration, in part by replicating the benefits of face-to- 
meeting, on another call, or otherwise temporarily unavail- face collaboration. The system tightly integrates a carefully 
able. In this event, the expert must communicate selected set of multimedia and collaborative capabilities, 
"asynchronously" — to bridge time as well as distance. principal among which are desktop teleconferencing and 

As discussed below, prior art desktop videoconferencing 65 multimedia mail, 
systems provide, at best, only a partial solution to the As used herein, desktop teleconferencing includes real- 
challenges of distributed collaboration in real time, prima- time audio and/or video teleconferencing, as well as data 
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conferencing. Data conferencing, in turn, includes snapshot FIGS. 2A and 2B are photographs which attempt to 
sharing (sharing of "snapshots 1 ' of selected regions of the illustrate, to the extent possible in a still image, the high- 
user's screen), application sharing (shared control of run- quality of the full-motion video and related user interface 
ning applications), shared whiteboard (equivalent to sharing displays that appear on typical CMW screens which may be 
a "blank* 1 window), and associated telepointing and anno- 5 generated during operation of a preferred embodiment of the 
tation capabilities. Teleconferences may be recorded and invention 

stored for kter playback, including both audio/video and all H(J ^ & block ^ of a 

^ a 7 1 ° nS ' , . embodiment of a "multimedia local area network" (MLAN) 

While desktop teleconferencing supports real-time m accordance ^ , ^ collaboration system embodi- 

interactions, multimedia mail permits the asynchronous . n 4 _ ^ r - 

exchange of arbitrary multimedia documents, including pre- 10 ment of the present invention. 

viously recorded teleconferences. Indeed, it is to be under- ^G- 4 is a block and schematic diagram illustrating how 

stood that the multimedia capabilities underlying desktop a plurality of geographically dispersed MLANs of the type 

teleconferencing and multimedia mail also greatly facilitate shown in FIG. 3 can be connected via a wide area network 

the creation, viewing, and manipulation of high-quality in accordance with the present invention. 

multimedia documents in general, including animations and 15 FIG. 5 is a schematic diagram illustrating how collabo- 

visualizations that might be developed, for example, in the ration sites at distant locations L1-L8 are conventionally 

course of information analysis and modeling. Further, these interconnected over a wide area network by individually 

animations and visualizations may be generated far indi- connecting each site to every other site. 

vidual rather than collaborative use, such that the present , . o r . _ i11n . „ , „ . 

i_ u a ™ u * * on FIG* 6 is a schematic diagram illustrating how collabo- 

lnvention has utility beyond a collaboration context 20 ~ » 

™T " j u ; * wmwiauvu wiiw^u ration sites at distant locations 1A-L& are interconnected 

The preferred embodiment of the invention is a collabo- 0 . Aa 0 ^ 0 n ^, nw .\ r :„ n ^ 

rative multimedia workstation (CMW) system wherein very ™" a J lde netw ^ m * ^ f<xtcd « nbodunent of me 

high-quaUty audio and video capabilities can be readily inven * on USU1 S a ^^-hoppmg approach, 

superimposed onto an enterprise's existing computing and FSG. 7 is a block diagram illustrating a preferred embodi- 

network infrastructure, including workstations, LANs, 25 rnent of video mosaicing circuitry provided in the MLAN of 

WANs, and building wiring. FIG. 3. 

In a preferred embodiment, the system architecture FIGS. 8A, 8B and 8C illustrate the video window on a 
employs separate real-time and asynchronous networks — typical CMW screen which may be generated during opera- 
trie former for real-time audio and video, and the latter for tion of a preferred embodiment of the present invention, and 
non-real-time audio and video, text, graphics and other data, 30 which contains only the callee for two-party calls (8A) and 
as well as control signals. These networks are interoperable a video mosaic of all participants, e.g., for four-party (8B) or 
across different computers (e.g., Macintosh, Intel-based eight-party (8C) conference calls. 

PCs, and Sun workstations), operating systems (e.g., Apple FIG. 9 is a block diagram illustrating a preferred embodi- 

System 7, DOS/Windows, and UNIX) and network operat- ment of audio mixing circuitry provided in the MLAN of 

ing systems (e.g., Novell Netware and Sun ONC+). In many 35 FIG. 3. 

cases, both networks can actually share the same cabling and FIG. 10 is a block diagram illustrating video cut-and- 

wall jack connector. paste circuitry provided in the MLAN of FIG. 3. 

The system architecture also accommodates the situation mG u is a schematic diagram illustrating typical opera- 

in which the user's desktop computing and/or communica- tion of me video C ut-and-paste circuitry in FIG. 10. 

tions equipment provides varytogtevels of media-handUng 40 ^ ^ (consistin of mQS ^ 12B m ^ 

capability. For example, a collaborafcon session-whether u i4B, ^ 1SB , 16 ; i7A and 17B) mustrate various 

r<^^eorasynchronous-rraymclua^ examples of how a preferred embodiment of the present 

ajuipmen provides capabilities ranging from audio only (a eg yideo mosaicin video cut-and-pasting, 

telephone) or dataonly (a personal co^uterwi&amodem) ^ audio mixing at a plurality of distant sites for transmis- 

K> a full complement : erf i^ti^ 1^1^ aiito^ ^ 45 si(m Q ^ fl ^ m order t0 ^ at ^ 

rM-mouon video, and high-speed data network facilities. CMW of ^ participant, video images and 

The CMW system architecture is readily scalable to very audio captured ^ ^ ^ conference participants, 

large enterprise ; wide network environments accornmodat- nGS 18A ^ 18B mustrate various preferred embodi- 

ingthousan* s of users. Further, it is jui open axcWte^tl^ ments of a CMW which may be employed in accordance 

can accommodate appropriate standards. Finally, the CMW 50 ^ ^ ^ 

system incorporates an intuitive, yet powerful, user ..r. . ^ _ „ A . 

,*«*«*o M +JI ™„ PIG- 19 is a schematic diagram of a preferred embodiment 

interface, making the system easy to learn and use. r ^ r . . . . , . 

_ . . , j of a CMW add-on box containing integrated audio and video 

The present invention thus provides a dutobuted muto- yQ ^ ^ accordance ^ ^ t 

media collaboration environment that achieves the benefits ^ n T n , _ . . 

^ f f n ^ _ r nr . „ A nr met i MTtm „ a „ 55 FIG. 20 illustrates CMW software in accordance with a 

01 race-to- race collaooration as nearly as possible, leverages JJ _ A ... ^ . . . ^ . ^ . , 

("snaps on to") existing computing and network Uasd^c- enAodunent of the present invention, integrated 

ture to the maximum extenTpossfcle, scales to very large ^ st ^ rd ^taskutg operating system and apphca- 
networks consisting of thousand of workstations, accommo- 

dates emerging standards, and is easy to learn and use. The ™; 21 *°fates software modules which 1 may be pro- 
specific nature of the invention, as well as its objects, 60 vided for running on the MLAN Server in the MLAN of 
features, advantages and uses, will become more readily i controlling operation of the AV and Data Net- 
apparent from the following detailed description and works. 

examples, and from the accompanying drawings. P 10 - 22 illustrates an enlarged example of "speed-dial" 

face icons of certain collaboration participants in a Collabo- 

BRIEF DESCRIPTION OF THE DRAWINGS ^ ration Initiator window on a typical CMW screen which may 

FIG* 1 is an enterprise view of a desktop collaboration be generated during operation of a preferred embodiment of 

system embodiment of the present invention. the present invention. 
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FIG. 23 is a diagrammatic representation of the basic same physical premises may be connected via bridges/routes 

operating events occurring in a preferred embodiment of the 11, as shown, to WANs and one another, 

present invention during initiation of a two-party call. In accordance with the present invention, the system of 

FIG. 24 is a block and schematic diagram illustrating how mG - 1 accommodates both "real time" delay- and jitter- 
physical connections are established in the MLAN of FIG. 5 sensitive signals (e.g., real-time audio and video 
3 for physically connecting first and second workstations for teleconferencing) and classical asynchronous data (e.g., data 

a two-party videoconference call con ?* S1 ^ s a ? *f as £55",?* 

, . , media) communication among multiple CMWs 12 regard- 

HG.^isablockandsche^ less of ^ loasdojL onl teD QdW$ n ^ 

Physical co nnections^prefer ably established in MLANs mustrated & m 1( it win ^ understood that many more 

such as LUustrated m FIG. 3, for a two-party caU between a could be provided. As also indicated in FIG. 1, various other 

first CMW located at one site and a second CMW located at multimedia resources 16 (e.g., VCRs, laserdiscs, TV feeds, 

a remote site. etc ^ m to MLANs 10 and are thereby accessible 

FIGS. 26 and 27 are block and schematic diagrams by individual CMWs 12. 

illustrating how conference bridging is preferably provided 15 ^ m nG. 1 may use any of a variety of types of 

in the MLAN of FIG. 3. operating systems, such as Apple System 7, UNIX, DOS/ 

FIG. 28 diagrammatically illustrates how a snapshot with Windows and, OS/2. The CMWs can also have different 

annotations may be stored in a plurality of bitmaps during types of window systems. Specific preferred embodiments 

data sharing. of a CMW 12 are described hereinafter in connection with 

FIG. 29 is a schematic and diagrammatic illustration of 20 FIGS. 18A and 18B. Note that this invention allows for a 

the interaction among multimedia mail (MMM), multimedia mix of operating systems and window systems across indi- 

call/conference recording (MMCR) and multimedia docu- vidual CMWs. 

ment management (MMDM) facilities. In the preferred embodiment, CMW 12 in FIG. 1 provides 

FIG. 30 is a schematic and diagrammatic illustration of real-time audio/video/data capabilities along with the usual 

the multimedia document architecture employed in a pre- 25 data processing capabilities provided by its operating sys- 

ferred embodiment of the invention. tem - CMW 12 also provides for bidirectional 

FIG. 31A illustrates a centralized Audio/Video Storage c^nication, via Hnes 13, within MLAN 10 for audio/ 

Seryej. video signals as well as data signals. Audio/video signals 

w " . t , . . transmitted from a CMW 12 typically comprise a high- 

FIG. 3 IB is a schematic ^and ^diagrainmatic lustration of 3Q ^ Uve video image md audio of te C MW operator, 

the interactions between the ^Audio/Video Storage Server ^ si ^ m obtained from a ^ ^ ^cro- 

and the remainder of the CMW System phone at the CMW (via an add-on unit or partially 

FIG. 31C illustrates an alternative embodiment of the or totally integrated into the CMW), processed, and then 

interactions illustrated in FIG. 31B. made available to low-cost network transmission sub- 

FIG. 31D is a schematic and diagrammatic illustration of 35 systems, 

the integration of MMM, MMCR and MMDM facilities in Audio/video signals received by a CMW 12 from MLAN 

a preferred embodiment of the invention. 10 may typically include: video images of one or more 

FIG. 32 illustrates a generalized hardware implementation conference participants and associated audio, video and 

of a scalable Audio/Video Storage Server. audio from multimedia mail, previously recorded audio/ 

FIG. 33 illustrates a higher throughput version of the 40 from previous calls and conferences, and standard 

server illustrated in FIG. 32, using SCSI-based crosspcint broadcast television (e.g., CNN). Received video signals are 

switching to increase the number of possible simultaneous displayed on the CMW screen or on an adjacent monitor, 

file transfers and ^ accompanying audio is reproduced by a speaker 

SSEM^t^aS^ ^J^e^a^viaaCMWadd.n 
FIGS. 35-42 fflustrate a series of CMW screens which ^ ^ ferrcd embodimeat , it has been found particu- 
may be generated during operation of a preferred ernbodi- lad to ftmUe the above-described video at 

ment of the present invention for a typical scenario involv- y, standard NTSC-quality TV performance (i.e., 30 frames per 
mg a remote opert who takes advantage of many of the seC ond at 640x480 pfeels per rrame and the wpiivalent of 24 
features provided by the present invention. Wts of color ^ ^ ^ accompanyin ] high-fidelity 

DETAILED DESCRIPTION OF THE ^f^f^ 7 ^ 15 Jf?^ ^ 

PREFERRED EMBODIMENTS 5S m ^? tes 8 ^ W sore *» ^nta^ng hv^-moQon 

55 video of three conference participants, while FIG. 2B illus- 
OVERALL SYSTEM ARCHITECTURE trates data shared and annotated by those conferees (lower 

left window). 

Referring initially to FIG. 1, illustrated therein is an ^^^^n T A _ 4 ^ rrmwT ~ v 

overall diagrammatic view of a multimedia collaboration MULTIMEDIA LOCAL AREA NETWORK 

system in accordance with the present invention. As shown, 60 Referring next to FIG. 3, illustrated therein is a preferred 
each of a plurality of "multimedia local area networks" embodiment of MLAN 10 having ten CMWs (12-1, 12-2, 
(MLANs) 10 connects, via lines 13, a plurality of CMWs -12-10), coupled therein via lines 13a and 13&. MLAN 10 
12-1 to 12-10 and provides audio/video/data networking for typically extends over a distance from a few hundred feet to 
supporting collaboration among CMW users. WAN IS in a few miles, and is usually located within a building or a 
turn connects multiple MLANs 10, and typically includes 65 group of proximate buildings. 

appropriate combinations of common carrier analog and Given the current state of networking technologies, it is 
digital transmission networks. Multiple MLANs 10 on the useful (for the sake of maintaining quality and miniinizing 
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costs) to provide separate signal paths for real-time audio/ a particularly advantageous manner, as will hereinafter be 
video and classical asynchronous data communications described in detail. A WAN gateway 40 provides for bidi- 
(including digitized audio and video enclosures of multime- rectional communication between MLAN 10 and WAN 15 
dia mail messages that are free from real-time delivery in FIG. 1. For this purpose, Data LAN hub 25 and AN 
constraints). At the moment, analog methods for carrying 5 Switching Circuitry 30 are coupled to WAN gateway 40 via 
real-time audio/video are preferred In the future, digital outputs 25a and 30a, respectively. Other devices connect to 
methods may be used. Eventually, digital audio and video the A/V Switching Circuitry SO and Data LAN hub 25 to add 
signal paths may be multiplexed with the data signal path as additional features (such as multimedia mail, conference 
a common digital stream. Another alternative is to multiplex recording, etc.) as discussed below, 
real-time and asynchronous data paths together using analog I0 Control of A/V Switching Circuitry 30, conference 
multiplexing methods. For the purposes of the present bridges 35 and WAN gateway 40 in FIG. 3 is provided by 
application, however, we will treat these two signal paths as MLAN Sewer 60 via lines 60£, 60c, and 60*/, respectively, 
using physically separate wires. Further, as the current In a preferred embodiment, MLAN Server 60 supports the 
preferred embodiment uses analog networking for audio and TCP/IP network protocol suite. Accordingly, software pro- 
video, it also physically separates the real-time and asyn- 15 cesses on CMWs 12 communicate with one another and 
chronous switching vehicles and, in particular, assumes an MLAN Server 60 via MLAN 10 using these protocols. Other 
analog audio/video switch. In the future, a common switch- network protocols could also be used, such as IPX. The 
ing vehicle (e.g. t ATM) could be used. manner in which software running on MLAN Server 60 
The MLAN 10 thus can be implemented in the preferred controls the operation of MLAN 10 will be described in 
embodiment using conventional technology, such as typical 20 detail nerei nafter. 

Data LAN hubs 25 and A/V Switching Circuitry 30 (as used Note in FIG. 3 that Data LAN hub 25, A/V Switching 

in television studios and other closed-circuit television Circuitry 30 and MLAN Server 60 also provide respective 

networks), linked to the CMWs 12 via appropriate trans- lines 25&, 30£, and 60c for coupling to additional multime- 

ceivers and unshielded twisted pair (UTP) wiring. Note in dia resources 16 (FIG. 1), such as multimedia document 

FIG. 1 that lines 13, which interconnect each CMW 12 2 $ rnanagement, multimedia databases, radio/TV channels, etc. 

within its respective MLAN 10, comprise two sets of lines Data LAN hub 25 (via bridges/routers 11 in HG. 1) and AN 

13a and 13Z>. Lines 13a provide bidirectional cornmunica- Switching Circuitry 30 additionally provide lines 25c and 

tion of audio/video within MLAN 10, while lines 13b 30cforcoupUngtooneormoreomerMLANsl0whichmay 

provide for the bidirectional communication of data. This be in the same locality (i.e., not far enough away to require 

separation permits conventional LANs to be used for data 30 use of WAN technology). Where WANs are required, WAN 

communications and a supplemental network to be used for gateways 40 are used to provide highest quality compression 

audio/video communications. Although this separation is methods and standards in a shared resource fashion, thus 

advantageous in the preferred embodiment, it is again to be minimizing costs at the workstation for a given WAN quality 

understood that audio/video/data networking can also be level, as discussed below. 

implemented using a single pair of lines for both audio/video 35 The basic operation of the preferred embodiment of the 

and data communications via a very wide variety of analog resulting collaboration system shown in HGS. 1 and 3 will 

and digital multiplexing schemes. next be considered. Important features of the present inven- 

WhUe lines 13a and 13b may be implemented in various tion reside in providing not only multi-party real-time desk- 
ways, it is currently preferred to use commonly installed top audio/video/data teleconferencing among geographi- 
4-pair UTP telephone wires, wherein one pair is used for 40 cal ty distributed CMWs, but also in providing from the same 
m coming video with accompanying audio (mono or stereo) desktop audio/video/data/text/graphics mail capabilities, as 
multiplexed in, wherein another pair is used for outgoing well as access to other resources, such as databases, audio 
multiplexed audio/video, and wherein the remaining two and video files, overview cameras, standard TV channels, 
pairs are used for carrying mcorning and outgoing data in etc. FIG. 2B illustrates a CMW screen showing a multimedia 
ways consistent with existing LANs. For example, lOBaseT 45 EMAIL mailbox (top left window) containing references to 
Ethernet uses RJ-45 pins 1, 2, 4, and 6, leaving pins 3, 5, 7, a number of received messages along with a video enclosure 
and 8 available for the two A/V twisted pairs. The resulting (top right window) to the selected message, 
system is compatible with standard (AT&T 258A, ETA/TTA A/V Switching Circuitry 30 (whether digital or analog as 
568, 8P8C, lOBaseT, ISDN, 6P6C, etc.) telephone wiring in the preferred embodiment) provides common audio/video 
found commonly throughout telephone and LAN cable 50 switching for CMWs 12, conference bridges 35, WAN 
plants in most office buildings throughout the world. These gateway 40 and multimedia resources 16, as determined by 
UTP wires are used in a hierarchy or peer arrangements of MLAN Server 60, which in turn controls conference bridges 
star topologies to create MLAN 10, described below. Note 35 and WAN gateway 40. Similarly, asynchronous data is 
that the distance range of the data wires often must match communicated within MLAN 10 utilizing common data 
that of the video and audio. Various UTP-compatible data 55 communications formats where possible (e.g., for snapshot 
LAN networks may be used, such as Ethernet, token ring, sharing) so that the system can handle such data in a 
FDDI, ATM, etc. For distances longer than the maximum common manner, regardless of origin, thereby facilitating 
distance specified by the data LAN protocol, data signals can multimedia mail and data sharing as well as audio/video 
be additionally processed for proper UTP operations. communications. 

As shown in FIG. 3, lines 13a from each CMW 12 are 60 For example, to provide multi-party teleconferencing, an 

coupled to a conventional Data LAN hub 25, which facfli- initiating CMW 12 signals MLAN Server 60 via Data LAN 

tates the communication of data (including control signals) hub 25 identifying the desired conference participants. After 

among such CMWs. Lines 13b in FIG. 3 are connected to determining which of these conferees will accept the call, 

A/V Switching Circuitry 30. One or more conference MLAN Server 60 controls AN Switching Circuitry 30 (and 

bridges 35 are coupled to A/V Switching Circuitry 30 and 65 CMW software via the data network) to set up the required 

possibly (if needed) the Data LAN hub 25, via lines 35b and audio/video and data paths to conferees at the same location 

35c, respectively, for providing multi-party Conferencing in as the initiating CMW. 
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When one or more conferees are at distant locations, the GUI and software used for snapshot sharing (described 
respective MLAN Servers 60 of the involved MLANs 10, on below) can also be used as an input/output interface for 
a peer-to-peer basis, control their respective A/V Switching multimedia mail and more general forms of multimedia 
Circuitry 30, conference bridges 35, and WAN gateways 40 documents. This can be accomplished by structuring the 

to set up appropriate communication paths (via WAN 15 in 5 interprocess communication protocols to be uniform across 
FIG. 1) as required for interconnecting the conferees. all these applications. More complicated examples — 
MLAN Sewers 60 also communicate with one another via specifically multimedia conference recording, multimedia 
data paths so that each MLAN 10 contains updated infor- mail and multimedia document management — will be pre- 
motion as to the capabilities of all of the system CMWs 12, sented in detail below. 

and also the current locations of all parties available for 10 

teleconferencing. WIDE AREA NETWORK 

The data conferencing component of the above-described Next to be described in connection with FIG. 4 is the 
system supports the sharing of visual information at one or advantageous manner in which the present invention pro- 
more CMWs (as described in greater detail below). This vides for real-time audio/video/data communication among 
encompasses both "snapshot sharing" (sharing "snapshots** ^ geographically dispersed MLANs 10 via WAN 15 (FIG. 1), 
of complete or partial screens, or of one or more selected whereby communication delays, cost and degradation of 
windows) and "application sharing** (sharing both the con- video quality are significantly ininimized from what would 
trol and display of running applications). When transferring otherwise be expected. 

images, lossless or slightly lossy image compression can be pour MLANs 10 are illustrated at locations A, B, C and 

used to reduce network bandwidth requirements and user- 2° D 12-1 to 12-10, A/V Switching Circuitry 30, Data 

perceived delay while rnaintaining high image quality. LAN hub 25, and WAN gateway 40 at each location 

In all cases, any participant can point at or annotate the correspond to those shown in FIGS, 1 and 3. WAN gateway 

shared data. These associated telepointers and annotations 40 in FIG. 4 will be seen to comprise a router/codec (R&C) 

appear on every participant* s CMW screen as they are drawn bank 42 coupled to WAN 15 via WAN switching multiplexer 

(Le., effectively in real time). For example, note FIG. 2B 25 44. The router is used for data interconnection and the codec 

which illustrates a typical CMW screen during a multi-party is used for audio/video interconnection (for multimedia mail 

teleconferencing session, wherein the screen contains anno- and document transmission, as well as videoconferencing), 

tated shared data as well as video images of the conferees. Codecs from multiple vendors, or supporting various com- 

As described in greater detail below, all or portions of the pression algorithms may be employed. In the preferred 

audio/video and data of the teleconference can be recorded 30 embodiment, the router and codec are combined with the 

at a CMW (or within MLAN 10), complete with all the data switching multiplexer to form a single integrated unit 

interactions. Typically, WAN 15 is comprised of Tl or ISDN common- 

In the above-described preferred embodiment, audio/ carrier-provided digital links (switched or dedicated), in 

video file services can be implemented either at the indi- 35 which case WAN switching multiplexers 44 are of the 

vidual CMWs 12 or by employing a centralized audio/video appropriate type (Tl, ISDN, fractional Tl, T3, switched 56 

storage server. This is one example of the many types of Kbps, etc.). Note that the WAN switching multiplexer 44 

additional servers that can be added to the basic system of typically creates subchannels whose bandwidth is a multiple 

MLANs 10. A similar approach is used for mcorporating of 64 Kbps (i.e., 256 Kbps, 384, 768, etc) among the Tl, T3 

ether multimedia services, such as commercial TV channels, ^ or ISDN carriers. Inverse multiplexers may be required 

multimedia mail, multimedia document management, mul- when using 56 Kbps dedicated or switched services from 

timedia conference recording, visualization servers, etc. (as these carriers. 

described in greater detail below). Certainly, applications in the MLAN 10 to WAN 15 direction, router/codec bank 

that run serf-contained on a CMW can be readily added, but 42 in FIG. 4 provides conventional analog-to-digital con- 

the invention extends this capability greatly in the way that 45 version and compression of audio/video signals received 

MLAN 10, storage and other functions are implemented and from A/V Switching Circuitry 30 for transmission to WAN 

leveraged. 15 via WAN switching multiplexer 44, along with transmis- 

In particular, standard signal formats, network interfaces, sion and routing of data signals received from Data LAN 

user interface messages, and call models can allow virtually hub 25. In the WAN 15 to MLAN 10 direction, each 

any multimedia resource to be smoothly integrated into the 50 router/codec bank 42 in FIG. 4 provides digital-to-analog 

system. Factors facilitating such smooth integration include: conversion and decompression of audio/video digital signals 

(i) a common mechanism for user access across the network; received from WAN 15 via WAN switching multiplexer 44 

(ii) a common metaphor (e.g., placing a call) for the user to for transmission to A/V Switching Circuitry 30, along with 
initiate use of such resource; (iii) the ability for one function the transmission to Data LAN hub 25 of data signals 
(e.g., a multimedia conference or multimedia database) to 55 received from WAN 15. 

access and exchange information with another function The system also provides optimal routes far audio/video 

(e.g., multimedia mail); and (iv) the ability to extend such signals through the WAN. For example, in FIG. 4, location 

access of one networked function by another networked A can take either a direct route to location D via path 47, or 

function to relatively complex nestings of simpler functions a two-hop route through location C via paths 48 and 49. If 

(for example, record a multimedia conference in which a ^ the direct path 47 linking location A and location D is 

group of users has accessed multimedia mail messages and unavailable, the multipath route via location C and paths 48 

transferred them to a multimedia database, and then send and 49 could be used. 

part of the conference recording just created as a new in a more complex network, several multi-hop routes are 

multimedia mail message, utilizing a multimedia mail editor typically available, in which case the routing system handles 

necessar y)- 65 the decision making, which for example can be based on 

A simple example of the smooth integration of functions network loading considerations. Note the resulting two-level 

made possible by the above-described approach is that the network hierarchy: a MLAN 10 to MLAN 10 (i.e., site-to- 
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site) service connecting codecs with one another only at ticipating in a teleconference into a single analog mosaic 

connection endpoints. picture. As shown in FIG. 7, analog video signals 112-1 to 

The cost savings made possible by providing the above- H2-n from the participants of a teleconference are applied 

described multi-hop capability (with intermediate codec *> video mosaicing circuitry 36, which in the preferred 

bypassing) are very significant as will become evident by 5 embodiment is provided as part of conference bridge 35 in 

noting the examples of FIGS. 5 and 6. FIG. 5 shows that HG. 3. These analog video inputs 11Z-1 to 112-n are 

using the conventional "fully connected mesh" location-to- obtained from the A/V Switching Circuitry 30 (FIG. 3) and 

location approach, thirty-six WAN links are required for niay include video signals from CMWs at one or more 

interconnecting the nine locations LI to L8. On the other distant sites (received via WAN gateway 40) as well as from 

hand, using the above multi-hop capabilities, only nine 10 omer CMWs at the local site. 

WAN links are required, as shown in FIG. 6. As the number In the preferred embodiment, video mosaicing circuitry 

of locations increase, the difference in cost becomes even 36 is capable of receiving N individual analog video picture 

greater, growing as the square of the number of sites. For signals (where N is a squared integer, i.e., 4, 9, 16, etc), 

example, for 100 locations, the conventional approach Circuitry 36 first reduces the size of the N input video 

would require about 5,000 WAN links, while the multi-hop 15 signals by reducing the resolutions of each by a factor of M 

approach of the present invention would typically require (where M is the square root of N (Le., 2, 3, 4, etc.), and then 

300 or fewer (possibly considerably fewer) WAN links. arranging them in an M-by-M mosaic of N images. The 

Although specific WAN links for the multi-hop approach of resulting single analog mosaic 36a obtained from video 

the invention would require higher bandwidth to carry the mosaicing circuitry 36 is then transmitted to the individual 

additional traffic, the cost involved is very much smaller as 20 CMWs for display on the screens thereof, 

compared to the cost for the very much larger number of As will become evident hereinafter, it may be preferable 

WAN links required by the conventional approach. to send a different mosaic to distant sites, in which case 

At the endpoints of a wide-area call, the WAN switching video mosaicing circuitry 36 would provide an additional 
multiplexer routes audio/video signals directly from the mosaic 36b for this purpose. A typical displayed mosaic 
WAN network interface through an available codec to 25 picture (N=4, M=2) showing three participants is illustrated 
ML AN 10 and vice versa. At intermediate hops in the in FIG. 2A. A mosaic containing four participants is shown 
network, however, video signals are routed from one net- in FIG. 8B. It will be appreciated that, since a mosaic (36a 
work interface on the WAN switching multiplexer to another or 36b) can be transmitted as a single video picture to an 
network interface. Although A/V Switching Circuitry 30 other site, via WAN 15 (FIGS. 1 and 4), only one codec and 
could be used for this purpose, the preferred embodiment 30 digital trunk are required. Of course, if only a single indi- 
provides switching functionality inside the WAN switching vidual video picture is required to be sent from a site, it may 
multiplexer. By doing so, it avoids having to route audio/ be sent directly without being included in a mosaic, 
video signals through codecs to the analog switching Note that for large conferences it is possible to employ 
circuitry, thereby avoiding additional codec delays at the multiple video mosaics, one for each video window sup- 
intermediate locations. 35 ported by the CMWs (see, e.g., FIG. 8Q. In very large 

A product capable of performing the basic switching conferences, it is also possible to display video only from a 

functions described above for WAN switching multiplexer select focus group whose members are selected by a 

44 is available from Teleos Corporation, Eatontown, NJ. dynamic <l floor control* 1 mechanism. Also note that, with 

This product is not known to have been used far providing ^ additional mosaic hardware, it is possible to give each CMW 

audio/video multi-hopping and dynamic switching among its own mosaic. This can be used in small conferences to 

various WAN links as described above. raise the maximum number of participants (from M 2 to M 2 

In addition to the above-described multiple-hop approach, +1— i.c, 5, 10, 17, etc.) or to give everyone in a large 
the preferred embodiment of the present invention provides conference their own "focus group" view, 
a particularly advantageous way of minimizing delay, cost 45 Also note that the entire video mosaicing approach 
and degradation of video quality in a multi-party video described thus far and continued below applies should 
teleconference involving geographically dispersed sites, digital video transmission be used in lieu of analog 
while still delivering full conference views of all partici- transmission, particularly since both mosaic and video win- 
pants. Normally, in order for the CMWs at all sites to be dow implementations use digital formats internally and in 
provided with live audio/video of every participant in a 50 current products are transformed to and from analog for 
teleconference simultaneously, each site has to allocate (in external interfacing. In particular, note that mosaicing can be 
router/codec bank 42 in FIG. 4) a separate codec for each done digitally without decompression with many existing 
participant, as well as a like number of WAN trunks (via compression schemes. Further, with an all-digital approach, 
WAN switching multiplexer 44 in HG. 4). mosaicing can be done as needed directly on the CMW. 

As will next be described, however, the preferred embodi- 55 FIG. 9 illustrates preferred audio mixing circuitry 38 for 

ment of the invention advantageously permits each wide use in conjunction with the video mosaicing circuitry 36 in 

area audio/video teleconference to use only one codec at FIG. 7, both of which may be part of conference bridges 35 

each site, and a minimum number of WAN digital trunks. in HG. 3. As shown in FIG. 9, audio signals 114-1 to 114-n 

Basically, the preferred embodiment achieves this most are applied to audio summing circuitry 38 for combination, 

important result by employing "distributed" video mosaic- 60 These input audio signals 114-1 to 114-n may include audio 

ing via a video "cut-and-paste" technology along with signals from local participants as well as audio sums from 

distributed audio mixing. participants at distant sites. Audio mixing circuitry 38 pro- 

vides a respective "minus-l" sum output 38-1, 38a-2, eta 

DISTRIBUTED VIDEO MOSAICING for each participant Thus, each participant hears every 

HG. 7 illustrates a preferred way of providing video 65 conference participant's audio except his/her own. 

mosaicing in the MLAN of FIG. 3 — i.e., by combining the In the preferred embodiment, sums are decomposed and 

individual analog video pictures from the individuals par- formed in a distributed fashion, creating partial sums at one 
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site which are completed at other sites by appropriate signal (having the same figure number) illustrates the associated 

insertion. Accordingly, audio mixing circuitry 38 is able to audio mixing provided. Note that these figures indicate 

provide one or more additional sums, such as indicated by typical delays that might be encountered for each example 

output 38, for sending to other sites having conference (with a single "UNIT" delay ranging from 0-450 

participants. 5 milliseconds, depending upon available compression 

Next to be considered is the manner in which video technology), 

cut-and-paste techniques are advantageously employed in FIGS. 12A and 12B illustrate a 2-site example having two 

the preferred embodiment It will be understood that, since participants A and B at Site #1 and two participants C and 

video mosaics and/or individual video pictures may be sent D at Site #2. Note that mis example requires mos airing and 

from one or more other sites, the problem arises as to how 10 cut-and-paste at both sites. 

these situations are handled. In the preferred embodiment, FIGS. 13A and 13B illustrate another 2-site example, but 

video cut-and-paste circuitry 39, as illustrated in FIG. 10, is having three participants A, B and C at Site #1 and one 

provided for this purpose, and may also be incorporated in participant D at Site #2. Note that this example requires 

the conference bridges 35 in FIG. 3. mosaicing at both sites, but cut-and-paste only at Site #2. 

Referring to FIG. 10, video cut-and-paste circuitry 15 FIGS. 14A and 14B illustrate a 3-site example having 

receives analog video inputs 116, which may be comprised participants A and B at Site #1, participant C at Site #2, and 

of one or more mosaics or single video pictures received participant D at Site #3. At Site #1, the three local videos A, 

from one or more distant sites and a mosaic or single video B and C are put into a mosaic which is sent to both Site #2 

picture produced by the local site. It is assumed that the local and Site #3. At Site #2 and Site #3, cut-and-paste is used to 

video mosaicing circuitry 36 (FIG. 7) and the video cut- 20 iasert the single video (C or D) at that site into the empty 

and-paste circuitry 39 have the capability of handling all of region in the imported A, B, C mosaic, as shown, 

the applied individual video pictures, or at least are able to Accordingly, mosaicing is required at all three sites, and 

choose which ones are to be displayed based on existing cut-and-paste is required for only Site #2 and Site #3. 

available signals. ^ pj GS ^ ^ 15B mustrate another 3-site example 

The video cut-and-paste circuitry 39 digitizes the incom- having participant A at Site #1, participant B at Site #2, and 

ing analog video inputs 116, selectively rearranges the participants C and D at Site #3. Note that mosaicing and 

digital signals on a region-by-region basis to produce a cut-and-paste are required at all sites. Site #2 additionally 

single digital M-by-M mosaic, having individual pictures in has the capability to send different cut-and-paste mosaics to 

selected regions, and then converts the resulting digital 3Q Sites #1 and Sites #3. Further note with respect to FIG. 15B 

mosaic back to analog form to provide a single analog that Site #2 creates minus- 1 audio mixes for Site #1 and Site 

mosaic picture 39a far sending to local participants (and #2, but only provides a partial audio mix (A&B) for Site #3. 

other sites where required) having the individual input video These partial mixes are completed at Site #3 by mixing.in 

pictures in appropriate regions. This resulting cut-and-paste c's signal to complete D's mix (A+B+C) and D's signal to 

analog mosaic 39a will provide the same type of display as 35 complete Cs mix (A+B+D). 

iUustratedinHG8E^ nG 16 mustrates a ^ le a ^ 

is sometoes benefiaal to send different cu -and-paste { ^ one participant JXach site; mat is, par- 

mosaies to different sites, m which case video cut-and-paste ti( £^Ais at sW,^c£antB is at Site #2, participant 

SS?L?^ Ttv a<Ultl0nal CUt " and " paSte m0SiUCS C iVat Site #3, and participant D is at Site #4. An audio 

3S«>-1, 35M7-2, etc. tor mis purpose. ^ Cementation & not illustrated for this example, since 

FIG. 11 diagrammatically illustrates an example of how standard minus-1 mixing can be performed at Site #1, and 

video cut-and-paste circuitry may operate to provide the the appropriate sums transmitted to the other sites. 

cut : and^eanalogmosaic3^ mQS 1?A ^ 1JB mustrate a ^ k ^ ^ 

digitized individual signals 116a 116&, 116c derived from ^ ^ onc ^ tf each site but uses X topology 

the input ^video ^signals are ^ted" mto seated regions of 45 rather than a kzr tocology as in the example of FIG. if Note 

a digitalframe buffer 17 to fonn a distal 2x2 mosaic, which that this example re^roosaic^ 

* Sr^f 0 ™T ^JJ 1 ?™ mOSaiC u 39a OT f * sites. Also noTe thatSite #2 and Ste #3 are each required to 

m FIG. 10. The requ^ed audio partial sums may be provided ^ mmA of ^^.^ mosf J cs . 

by audio mixing circuitry 39 in FIG. 9 in the same manner, m . ... 

replacing each cut-and-paste video operation with a partial 50 The preferred embodiment also provides the capability of 

sum operation. allowing a conference participant to select a close-up of a 

TT . , ... . ^ — ^ 0 _ _ . participant displayed on a mosaic. This capability is pro- 

Having described in connection with FIGS. 7-11 how ^ £ henev £ / m ^ pictur ^ * 

yideo ; mosaiong, audio mixing vio^ cut-and-pastng, and that user's site. In such case, the A/V Switching Circuitry 30 

distributed aud^rmxmg may be performed, the following ^ 3) switches me selected m ^ ^ 

description of FIGS. 12-17 will aiustote how these capa- 55 locall or from another site) to the CMW that 

biliaes may advantageously be used in combination in the req uests the close-up. 

context of wide-area videoconferencing. For these „ . . . . . . . « _ 

examples, the teleconference is assumed to have four par- ,„ Ne ? * be * collection with FIGS. ISA, 1«B, 
ticipants designated as A, B, C and D, in which case 2x2 19 20 310 various F^** embodiments of a CMW in 
(quad) mosaics are employed. It is to be understood that 60 accordance mth me IBVent,0IL 
greater numbers of participants could be provided. Also, two COLLABORATIVE MULTIMEDIA 
or more smMtaneously occurring teleconferences could WORKSTATION HARDWARE 
also be handled, in which case additional mosaicing, cut- 
and-paste and audio mixing circuitry would be provided at One embodiment of a CMW 12 of the present invention 
the various sites along with additional WAN paths. For each 65 is illustrated in FIG. 18A. Currently available personal 
example, the "A" figure illustrates the video mosaicing and computers (e.g., an Apple Macintosh or an IBM-compatible 
cut-and-pasting provided, and the corresponding "B" figure PC, desktop or laptop) and workstations (e.g., a Sun 



07/24/2004, EAST Version: 1.4.1 



5,689, 

15 

SPARCstation) can be adapted to work with the present 
invention to provide such features as real-time 
videoconferencing, data conferencing, multimedia mail, etc. 
In business situations, it can be advantageous to set up a 
laptop to operate with reduced functionality via cellular 5 
telephone links and removable storage media (e.g., 
CD-ROM, video tape with timecode support, etc.), but take 
on full capability back in the office via a docking station 
connected to the MLAN 10. TTiis requires a voice and data 
modem as yet another function server attached to the 10 
MLAN. 

The currently available personal computers and worksta- 
tions serve as a base workstation platform. The addition of 
certain audio and video I/O devices to the standard compo- 
nents of the base platform 100 (where standard components 15 
include the display monitor 200, keyboard 300 and mouse or 
tablet (or other pointing device) 400), all of which connect 
with the base platform box through standard peripheral ports 
101,102 and 103, enables the CMW to generate and receive 
real-time audio and video signals. These devices include a 20 
video camera 500 for capturing the user's, image, gestures 
and surroundings (particularly the user's face and upper 
body), a microphone 600 for capturing the user's spoken 
words (and any other sounds generated at the CMW), a 
speaker 700 for presenting incoming audio signals (such as 25 
the spoken words of another participant to a videoconfer- 
ence or audio annotations to a document), a video input card 
130 in the base platform 100 for capturing incoming video 
signals (e.g., the image of another participant to a 
videoconference, or videomail), and a video display card 30 
120 for displaying video and graphical output on monitor 
200 (where video is typically displayed in a separate 
window). 

These peripheral audio and video I/O devices are readily 
available from a variety of vendors and are just beginning to 35 
become standard features in (and often physically integrated 
into the monitor and/or base platform of) certain personal 
computers and workstations. See, e.g., the aforementioned 
BYTE article ("Video Conquers the Desktop**), which 
describes current models of Apple* s Macintosh AV series 40 
personal computers and Silicon Graphics* Indy worksta- 
tions. 

Add-on box 800 (shown in FIG. 18A and illustrated in 
greater detail in FIG. 19) integrates these audio and video 45 
I/O devices with additional functions (such as adaptive echo 
canceling and signal switching) and interfaces with AV 
Network 901. AV Network 901 is the part of the MLAN 10 
which carries bidirectional audio and video signals among 
the CMWs and A/V Switching Circuitry 30 — e.g., utilizing ^ 
existing UTP wiring to carry audio and video signals (digital 
or analog, as in the present embodiment). 

In the present embodiment, the AV network 901 is sepa- 
rate and distinct from the Data Network 902 portion of the 
MLAN 10, which carries bidirectional data signals among 55 
the CMWs and the Data LAN hub (e.g. , an Ethernet network 
that also utilizes UTP wiring in the present embodiment with 
a network interface card 110 in each CMW). Note that each 
CMW will typically be a node on both the AV and the Data 
Networks. & 

There are several approaches to implementing Add-on 
box 800. In a typical videoconference, video camera 500 and 
microphone 600 capture and transmit outgoing video and 
audio signals into ports 801 and 802, respectively, of Add-on 
box 800. These signals are transmitted via Audio/Video I/O 65 
port 805 across AV Network 901. Incoming video and audio 
signals (from another videoconference participant) are 
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received across AV network 901 through Audio/Video I/O 
port 805. The video signals are sent out of V-OUT port 803 
of CMW add-on box 800 to video input card 130 of base 
platform 100, where they are displayed (typically in a 
separate video window) on monitor 200 utilizing the stan- 
dard base platform video display card 120. Hie audio signals 
are sent out of A-OUT port 804 of CMW add-on box 800 
and played through speaker 700 while the video signals are 
displayed on monitor 200. The same signal flow occurs for 
other non-teleconferencing applications of audio and video. 

Add-on box 800 can be controlled by CMW software 
(illustrated in FIG. 20) executed by base platform 100. 
Control signals can be communicated between base platform 
port 104 and Add-on box Control port 806 (e.g., an RS-232, 
Centronics, SCSI or other standard communications port). 

Many other embodiments of the CMW illustrated in FIG. 
18 A will work in accordance with the present invention. For 
example, Add-on box 800 itself can be implemented as an 
add-in card to the base platform 100. Connections to the 
audio and video I/O devices need not change, though the 
connection for base platform control can be implemented 
internally (e.g., via the system bus) rather than through an 
external RS-232 or SCSI peripheral port Various additional 
levels of integration can also be achieved as will be evident 
to those skilled in the art. For example, microphones, 
speakers, video cameras and UTP transceivers can be inte- 
grated into the base platform 100 itself, and all media 
handling technology and communications can be integrated 
onto a single card. 

A handset/headset jack enables the use of an integrated 
audio I/O device as an alternate to the separate microphone 
and speaker. A telephone interface could be integrated into 
add-on box 800 as a local implementation of computer- 
integrated telephony. A "hold" (i.e., audio and video mute) 
switch and/or a separate audio mute switch could be added 
to Add-on box 800 if such an implementation were deemed 
preferable to a software-based interface. 

The internals of Add-on box 800 of FIG. 18A are illus- 
trated in FIG. 19. Video signals generated at the CMW (e.g., 
captured by camera 500 of FIG. 18A) are sent to CMW 
add-on box 800 via V-IN port 80L They then typically pass 
unaffected through Loopback/AV Mute circuitry 830 via 
video ports 833 (input) and 834 (output) and into A/V 
Transceivers 840 (via Video In port 842) where they are 
transformed from standard video cable signals to UTP 
signals and sent out via port 845 and Audio/Video I/O port 
805 onto AV Network 901. 

The Loopback/AV Mute circuitry 830 can, however, be 
placed in various modes under software control via Control 
port 806 (implemented, for example, as a standard UART). 
If in loopback mode (e.g., for testing incoming and outgoing 
signals at the CMW), the video signals would be routed back 
out V-OUT port 803 via video port 831. If in a mute mode 
(e.g., muting audio, video or both), video signals might, for 
example, be disconnected and no video signal would be sent 
out video port 834. Loopback and muting switching func- 
tionality is also provided for audio in a similar way. Note 
that computer control of loopback is very useful for remote 
testing and diagnostics while manual override of computer 
control on mute is effective for assured privacy from use of 
the workstation for electronic spying. 

Video input (e.g., captured by the video camera at the 
CMW of another videoconference participant) is handled in 
a similar fashion. It is received along AV Network 901 
through Audio/Video I/O part 805 and port 845 of A/V 
Transceivers 840, where it is sent out Video Out port 841 to 
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video port 832 of Loopback/AV Mute circuitry 830, which the necessity of external connections to these integrated 

typically passes such signals out video port 831 to V-OUT audio and video I/O devices, and includes an LCD display 

port 803 (for receipt by a video input card or other display 810 for displaying the incoming video signal (which thus 

mechanism, such as LCD display 810 of CMW Side Mount eliminates the need for a base platform video input card 

unit 850 in FIG. 18B, to be discussed). s 130). 

Audio input and output (e.g., for playback through Given the proximity of Side Mount device 850 to the user, 
speaker 700 and capture by microphone 600 of FIG. 18A) and the direct access to audio/video I/O within that device, 
passes through A/V transceivers 840 (via Audio In port 844 various additional controls 820 can be provided at the user's 
and Audio Out port 843) and Loopback/AV Mute circuitry touch (all well within the capabilities of those skilled in the 
830 (through audio ports 837/838 and 836/835) in a similar 10 art). Note that, with enough additions, Side Mount unit 850 
manner. The audio input and output ports of Add-on box 800 can become virtually a standalone device that does not 
interface with standard amplifier and equalization circuitry, require a separate computer for services using only audio 
as well as an adaptive room echo canceler 814 to eliminate and video. This also provides a way of supplementing a 
echo, minimize feedback and provide enhanced audio per- network of full-feature workstations with a few low-cost . 
formance when using a separate microphone and speaker. In 15 additional "audio video intercoms" for certain sectors of an 
particular, use of adaptive room echo cancelers provides enterprise (such as clerical, reception, factory floor, etc.). 
high-quality audio interactions in wide area conferences. ^ portable laptop implementation can be made to deliver 
Because adaptive room echo canceling requires training multimedia mail with video, audio and synchronized anno- 
periods (typically involving an objectionable blast of high- tations via CD-ROM or an add-on videotape unit with 
amplitude white noise or tone sequences) for alignment with 20 separate video, audio and time code tracks (a stereo video- 
each acoustic environment, it is preferred that separate echo tape player can use the second audio channel for time code 
canceling be dedicated to each workstation rather than signals). Videotapes or CD-ROMs can be created in main 
sharing a smaller group of echo cancelers across a larger offices and express mailed, thus avoiding the need for 
group of workstations. high-bandwidth networking when on the road. Cellular 

Audio inputs passing through audio port 835 of 25 phone links can be used to obtain both voice and data 

Loopback/AV Mute circuitry 830 provide audio signals to a communications (via modems). Modem-based data commu- 

speaker (via standard Echo Canceler circuitry 814 and ideations are sufficient to support remote control of mail or 

A-OUT port 804) or to a handset or headset (via I/O ports presentation playback, annotation, file transfer and fax fea- 

807 and 808, respectively, under volume control circuitry tures. The laptop can then be brought into the office and 

815 controlled by software through Control port 806). In all 30 attached to a docking station where the available MLAN 10 

cases, incoming audio signals pass through power amplifier and additional functions adapted from Add-on box 800 can 

circuitry 812 before being sent out of Add-on box 800 to the be supplied, providing full CMW capability, 
appropriate audio-emitting transducer 

Outgoing audio signals generated at the CMW (e.g., by 
microphone 600 of FIG. 18A or the mouthpiece of a handset 

or headset) enter Add-on box 800 via A-IN port 802 (for a CMW software modules 160 are illustrated generally in 

microphone) or Handset or Headset I/O ports 807 and 808, FIG. 20 and discussed in greater detail below in conjunction 

respectively. In all cases, outgoing audio signals pass with the software running on MLAN Server 60 of FIG. 3. 

through standard preamplifier (811) and equalization (813) ^ Software 160 allows the user to initiate and manage (in 

circuitry, whereupon the desired signal is selected by stan- conjunction with the server software) videoconferencing, 

dard "Select" switching circuitry 816 (under software con- data conferencing, multimedia mail and other collaborative 

trol through Control port 806) and passed to audio port 837 sessions with other users across the network, 

of Loopback/AV Mute circuitry 830. Also present on the CMW in this embodiment are stan- 

It is to be understood that AN Transceivers 840 may 45 dard multitasking operating system/GUI software 180 (e.g., 

include iruixing/ demuxing facilities so as to enable the Apple Macintosh System 7, Microsoft Windows 3.1, or 

transmission of audio/video signals on a single pair of wires, UNIX with the "X Window System" and Motif or other GUI 

e.g., by encoding audio signals digitally in the vertical "window manager*' software) as well as other applications 

retrace interval of the analog video signal. Implementation 170, such as word processing and spreadsheet programs, 

of other audio and video enhancements, such as stereo audio 50 Software modules 161-168 communicate with operating 

and external audio/video I/O ports (e.g., for recording sig- system/GUI software 180 and other applications 170 utiliz- 

nals generated at the CMW), are also well within the ing standard function calls and interapplication protocols, 

capabilities of one skilled in the art If stereo audio is used The central component of the Collaborative Multimedia 

in teleconferencing (i.e., to create useful spatial metaphors Workstation software is the Collaboration Initiator 161. All 

for users), a second echo canceler may be recommended.. 55 collaborative functions can be accessed through this module. 

Another embodiment of the CMW of this invention, When the Collaboration Initiator is started, it exchanges 
illustrated in FIG. 18B, utilizes a separate (fully serf- initial configuration information with the Audio Video Net- 
contained) "Side Mount* approach which includes its own wor k Manager (AVNM) 60 (shown in HG. 3) through Data 
dedicated video display. This embodiment is advantageous Network 902. Information is also sent from the Collabora- 
in a variety of situations, such as instances in which addi- 60 tion Initiator to the AVNM indicating the location of the 
tional screen display area is desired (eg., in a laptop user, me types of services available on that workstation (e.g., 
computer or desktop system with a small monitor) or where videoconferencing, data conferencing, telephony, etc.) and 
it is impossible or undesirable to retrofit older, existing or other relevant initialization information, 
specialized desktop computers for audio/video support In The Collaboration Initiator presents a user interface that 
this embodiment, video camera 500, microphone 600 and 65 allows the user to initiate collaborative sessions (both real- 
speaker 700 of FIG. 18A are integrated together with the time and asynchronous). In the preferred embodiment, ses- 
functionality of Add-on box 800. Side Mount 850 eliminates sion participants can be selected from a graphical rolodex 
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163 that contains a scrollable list of user names or from a list CMW's audio I/O port). Further details on multimedia mail 
of quick-dial buttons 162. Quick-dial buttons show the face are provided below. 

icons for the users they represent. In the preferred 

embodiment, the icon representing the user is retrieved by MLAN SERVER SOFTWARE 

the Collaboration Initiator from the Directory Server 66 on < CTr , A i~~~>~**~*i~-\u, « 

MLAN Server 60 when it starts up. Users can dynamically } X ^aminatically illustrates software 62 cc-m- 

7T^ -i j- ;! " T j Y 1 ujruttiiuwuijr 0 f various modules (as discussed above) provided for 

add new quick-dial buttons by dragging the corresponding F , ^ « A vt o ^ • *u T j 

entries from the grartoal^^^ Server 60 ^^^i^l^ 

Once the user elects to initiate a collaborative session, he ^fment It is to be understood tha addiuonal software 

or she selects one or more desired partidpants by, for 10 ^^^1^ J?,f ? i^tT* 

example, dickmg on m*^^ 10 **• almough the software illustrated in FIG. 21 offers 

pantfrom the system rolodex or a personal rolodex^or by vmous significant advantages as will become evident 

cHcking on the quick-dial button foTthat participant (see, he *einafter, Cerent forms and arrangement of software 

e.g., HG. 2A). In either case, the user men selects the also be employed within the scope of the invention. The 

desired session type — e.g., by clicking on a CALL button to 15 softwarc UD P lemented m vanous 

initiate a videoconference call, a SHARE button to initiate 15 runnin S P rocesses - 

the sharing of a snapshot image or blank whiteboard, or a In the preferred embodiment, clients (e.g., software- 

MAIL button to send mail. Alternatively, the user can controlling workstations, VCRs, laserdiscs, multimedia 

double-click on the rolodex name or a face icon to initiate resources, etc.) communicate with the MLAN Server Soft- 

the default session type— e.g., an audio/video conference 20 ware Modules 62 using the TCP/IP network protocols. 

cadL Generally, the AVNM 63 cooperates with the Service Server 

The system also allows sessions to be invoked from the 69 ' Conference Bridge Manager (CBM 64 in FIG. 21) and 

keyboard. It provides a graphical editor to bind combina- WAN Network Manager (WNM 65 in FIG. 21) to 

tions of participants and session types to certain hot keys. mana 8 c communications within and among both MLANs 10 

Pressing this hot key (possibly in conjunction with a modi- 25 and WANs 15 (^S. 1 and 3). 

fier key, e.g.,<Shift> or <Cfrl>)wffl cause ra^ The AVNM add&onally cooperates with Audio/Video 

Initiator to start a session of the specified type with the given Storage Server €7 and other multimedia services 68 in FIG. 

participants. 21 to support various types of collaborative interactions as 

Once the user selects the desired participant and session described herein. CBM 64 in FIG. 21 operates as a client of 

type, Collaboration Initiator module 161 retrieves necessary 30 ^ AVNM 63 to manage conferencing by controlling the 

addressing information from Directory Service 66 (see FIG. operation of conference bridges 35. This includes manage- 

21). In the case of a videoconference call, the Collaboration ment t te video mosaicing circuitry 37, audio mixing 

Initiator then communicates with the AVNM (as described in circuitry 38 and cut-and-paste circuitry 39 preferably incor- 

greater detail below) to set up the necessary data structures porated therein. WNM 65 manages the allocation of paths 

and manage the various states of that call, and to control A/V 35 < codccs and provided by WAN gateway 40 for 

Switching Circuitry 30, which selects the appropriate audio accomplishing the communications to other sites called for 

and video signals to be transmitted to/from each partici- by me AVNM. 

pant's CMW. In the case of a data conferencing session, the A - - . VT . . XM 

Collaboration Initiator locates, via the AVNM, the Collabo- Vldeo Networi£ Mana 8" 

ration Initiator modules at the CMWs of the chosen 40 The AVNM 63 manages A/V Switching Circuitry 30 in 

recipients, and sends a message causing the Collaboration FIG. 3 for selectively routing audio/video signals to and 

Initiator modules to invoke the Snapshot Sharing modules from CMWs 12, and also to and from WAN gateway 40, as 

164 at each participant's CMW. Subsequent videoconfer- called for by clients. Audio/video devices (e.g., CMWs 12, 
encing and data conferencing functionality is discussed in conference bridges 35, multimedia resources 16 and WAN 
greater detail below in the context of particular usage 45 gateway 40 in FIG. 3) connected to A/V Switching Circuitry 
scenarios. 30 in FIG. 3, have physical connections for audio in, audio 

As indicated previously, additional collaborative out, video in and video out. For each device on the network, 

services — such as Mail 165, Application Sharing 166, *h e AVNM combines these four connections into a port 

Computer-Integrated Telephony 167 and Computer Inte- abstraction, wherein each pert represents an addressable 

grated Fax 168 — are also available from the CMW by 50 bidirectional audio/video channel Each device connected to 

utilizing Collaboration Initiator module 161 to initiate the foe network has at least one part Different ports may share 

session (i.e., to contact the participants) and to invoke the the same physical connections on the switch. For example, 

appropriate application necessary to manage the collabora- a conference bridge may typically have four ports (for 2x2 

rive session. When initiating asynchronous collaboration mosaicing) that share the same video-out connection. Not all 

(e.g., mail, fax, etc.), the Collaboration Initiator contacts 55 devices need both video and audio connections at a port For 

Directory Service 66 for address information (e.g., EMAIL example, a TV tuner port needs only incoming audio/video 

address, fax number, etc.) for the selected participants and connections. 

invokes the appropriate collaboration tools with the obtained In response to client program requests, the AVNM pro- 
address information. For real-time sessions, the Collabora- vides connectivity between audio/video devices by connect- 
tion Initiator queries the Service Server module 69 inside 60 ing their ports. Connecting ports is achieved by switching 
AVNM 63 for the current location of the specified partici- one port's physical input connections to the other port's 
pants. Using this location information, it cornrnunicates (via physical output connections (for both audio and video) and 
the AVNM) with the Collaboration Initiators of the other vice-versa. Client programs can specify which of the 4 
session participants to coordinate session setup. As a result, physical connections on its ports should be switched. This 
the various Collaboration Initiators will invoke modules 65 allows client programs to establish unidirectional calls (e.g., 
166, 167 or 168 (including activating any necessary devices by specifying that only the port's input connections should 
such as the connection between the telephone and the be switched and not the port's output connections) and 
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audio-only or video-only calls (by specifying audio connec- 
tions only or video connections only). 

Service Server 

Before client programs can access audio/video resources 
through the AVNM, they must register the collaborative 
services they provide with the Service Server 69. Examples 
of these services indicate "video call**, "snapshot sharing", 
"conference** and 'Video file sharing." These service records 
are entered into the Service Server's service database. The 
service database thus keeps track of Die location of client 
programs and the types of collaborative sessions in which 
they can participate. This allows the Collaboration Initiator 
to find collaboration participants no matter where they are 
located. The service database is replicated by all Service 
Servers: Service Servers communicate with other Service 
Servers in other MLANs throughout the system to exchange 
their service records. 

Clients may create a plurality of services, depending on 
the collaborative capabilities desired. When creating a 
service, a client can specify the network resources (e.g. 
ports) that will be used by this service. In particular, service 
information is used to associate a user with the audio/video 
ports physically connected to the particular CMW into 
which the user is logged in. Clients that want to receive 
requests do so by putting their services in listening mode. If 
clients want to accept incoming data shares, but want to 
block incoming video calls, they must create different ser- 
vices. 

A client can create an exclusive service on a set of parts 
to prevent other clients from creating services on these ports. 
This is useful, for example, to prevent multiple conference 
bridges from managing the same set of conference bridge 
ports. 

Next to be considered is the preferred manner in which the 
AVNM 63 (FIG. 21), in cooperation with the Service Server 
69, CBM 64 and participating CMWs provide for managing 
A/V Switching Circuitry 50 and conference bridges 35 in 
FIG. 3 during audio/video/data teleconferencing. The par- 
ticipating CMWs may include workstations located at both 
local and remote sites. 

BASIC TWO-PARTY VIDEOCONFERENCING 

As previously described, a CMW includes a Collabora- 
tion Initiator software module 161, (see FIG. 20) which is 
used to establish person-to-person and multiparty calls. The 
corresponding collaboration initiator window advanta- 
geously provides quick-dial face icons of frequently dialed 
persons, as illustrated, for example, in FIG. 22, which is an 
enlarged view of typical face icons along with various 
initiating buttons (described in greater detail below in con- 
nection with FIGS. 35-42). 

Videoconference calls can be initiated, for example, 
merely by double-clicking on these icons. When a call is 
initiated, the CMW typically provides a screen display mat 
includes a live video picture of the remote conference 
participant, as illustrated for example in FIG. 8A. In the 
preferred embodiment, this display also includes control 
buttons/menu items that can be used to place the remote 
participant on hold, to resume a call on hold, to add one or 
more participants to the call, to initiate data sharing and to 
hang up the call. 

The basic underlying software-controlled operations 
occurring for a two-party call are diagrammatically illus- 
trated in FIG. 23. When a caller initiates a call (e.g., by 
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selecting a user from the graphical rolodex and clicking the 
call button or by double-clicking the face icon of the callee 
on the quick-dial panel), the caller's Collaboration Initiator 
responds by identifying the selected user and requesting that 

5 user' s address from Directory Service 66, as indicated by (2) 
in FIG. 23. Directory Service 66 looks up the callee' s 
address in the directory database, as indicated by (3) in FIG. 
23, and then returns it to the caller's Collaboration Initiator, 
as illustrated by (4) in FIG. 23. 

10 The. caller's Collaboration Initiator sends a request to the 
AVNM to place a video call to the caller with the specified 
address, as indicated by (5) in FIG. 23. The AVNM queries 
the Service Server to find the service instance of type "video 
call" whose name corresponds to the callee's address. This 

i5 service record identifies the location of the callee' s Collabo- 
ration Initiator as well as the network ports that the callee is 
connected to. If. no service instance is found for the callee, 
the AVNM notifies the caller that the callee is not logged ia 
If the callee is local, the AVNM sends a call event to the 

2Q callee's Collaboration Initiator, as indicated by (6) in FIG. 
23. If the callee is at a remote site, the AVNM forwards the 
call request (5) through the WAN gateway 40 for 
transmission, via WAN 15 (FIG. 1) to the Collaboration 
Initiator of the callee's CMW at the remote site. 

25 The callee' s Collaboration Initiator can respond to the call 
event in a variety of ways. In the preferred embodiment, a 
user-selectable sound is generated to announce the incoming 
call. The Collaboration Initiator can then act in one of two 
modes. In 'Telephone Mode," the Collaboratibn Initiator 

30 displays an invitation message on the CMW screen that 
contains the name of the caller and buttons to accept or 
refuse the call. The Collaboration Initiator will men accept 
or refuse the call, depending on which button is pressed by 
the callee. In 'Intercom Mode," the Collaboration Initiator 

35 accepts all incoming calls automatically, unless there is 
already another call active on the callee's CMW, in which 
case behavior reverts to Telephone Mode. 

The callee's Collaboration Initiator then notifies the 
AVNM as to whether the call will be accepted or refused. If 

40 the call is accepted, (7), the AVNM sets up the necessary 
communication paths between the caller and the callee 
required to establish the call. The AVNM then notifies the 
caller's Collaboration Initiator mat the call has been estab- 
lished by sending it an accept event (8). If the caller and 

45 callee are at different sites, their AVNMs will coordinate in 
setting up the communication paths at both sites, as required 
by the call. 

The AVNM may provide for managing connections 
among CMWs . and other multimedia resources for audio/ 
50 video/data communications in various ways. The manner 
employed in the preferred embodiment will next be 
described. 

As has been described previously, the AVNM manages the 
switches in the AN Switching Circuitry 30 in FIG. 3 to 

55 provide port-to-port connections in response to connection 
requests from clients. The primary data structure used by the 
AVNM for managing these connections will be referred to as 
a callhandle, which is comprised of a plurality of bits, 
including state bits. 

60 Each part-to-port connection managed by the AVNM 
comprises two callhandles, one associated with each end of 
the connection. The callhandle at the client port of the 
connection permits the client to manage the client's end of 
the connection. The callhandle mode bits determine the 

65 current state of the callhandle and which of a port's four 
switch connections (video in, video out, audio in, audio out) 
are involved in a call. 
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AVNM clients send call requests to the AVNM whenever and 92b (through respective WAN gateways 40a and 40b 

they want to initiate a call. As part of a call request, the client and WAN IS). A call between CMW s WS-1 and WS-2 can 

specifies the local service in which the call will be involved, then be established by AVNM of MLAN 10a in response to 

the name of the specific port to use for the call, identifying the creation of caUhandles at ports 91a and 92a, setting up 

information as to the callee, and the call mode. In response, 5 appropriate connections between these ports as indicated by 

the AVNM creates a callhandle on the caller's port dashed line 93a, and by AVNM of MLAN 10i\ in response 

All callhandles are created in the "idle" state. The AVNM t0 caUhandles created at ports 91b and 92*, setting up 
then puts the caller's callhandle in the "active" state. The appropriate connections between these ports as indicated by 
AVNM next creates a callhandle for the callee and sends it ^ 9 * b - Appropriate paths 94a and 94ft in WAN 
a call event, which places the callee's callhandle in the i° gateways 40a and 40ft, respectively are set up by the WAN 
'ringing" state. When the callee accepts the call, its call- network manager 65 (FIG. 21) in each network, 
handle is placed in the "active" state, which results in a CONFERENCE CALLS 
physical connection between thP caller and tJje callee Each ^ be described fa me mc m ^ ^ 
port can have an arbitrary nurr^ of c^ preferred embodiment provides for multi-party conference 
but typically only one of these callhandles can be active at 15 ^ (favflMng more ^ ^ p^dpants^When a multi- 
trie same tune. party conference call is initiated, the CMW provides a 

After a call has been set up, AVNM clients can send screen that is similar to the screen for two-party calls, which 
requests to the AVNM to change the state of the call, which displays a live video picture of the callee's image in a video 
can advantageously be accomplished by controlling the ^ window. However, for multi-party calls, the screen includes 
callhandle states. For example, during a call, a call request a yidGO mosa | c con taining a live video picture of each of the 
from another party could arrive. This arrival could be conference participants (including the CMW user's own 
signaled to the user by providing an alert indication in a picture), as shown, for example, in FIG. 8B. Of course, other 
dialog box on the user's CMW screen. The user couldrefuse embodiments could show only the remote conference par- 
trie call by clicking on a refuse button in the dialog box, or ticipants (and not the local CMW user) in the conference 
by clicking on a "hold" button on the active call window to raosaic (or a m0S aic containing both participants in a 
put the current call on hold and allow the incoming call to two-party call). In addition to the controls shown in FIG. 8B, 
be accepted. the multi-party conference screen also includes buttons/ 

The placing of the currently active call on hold can menu items that can be used to place individual conference 

advantageously be accomplished by changing the caller's 30 participants on hold, to remove individual participants form 

callhandle from the active state to a **hold" state, which the conference, to adjourn the entire conference, or to 

permits the caller to answer incoming calls or initiate new provide a "close-up" image of a single individual (in place 

calls, without releasing the previous call Since the connec- of the video mosaic). 

tion set-up to the callee will be retained, a call on hold can Multi-party conferencing requires all the mechanisms 

conveniently be resumed by the caller clicking on a resume 35 employed for 2-party calls. In addition, it requires the 

button on the active call window, which returns the corre- conference bridge manager CBM 64 (FIG. 21) and the 

sponding callhandle back to the active state. Typically, conference bridges 36 (FIG. 3). The CBM acts as a client of 

multiple calls can be put on hold in this manner As an aid the AVNM in managing the operation of the conference 

in managing calls that are on hold, the CMW advanta- bridges 36. The CBM also acts a server to other clients on 

geously provides a hold list display, identifying these ^ the network. The CBM makes conferencing services avail- 

on-hold calls and (optionally) the length of time that each able by creating service records of type "conference*' in the 

party is on hold. A corresponding face icon could be used to AVNM service database and associating these services with 

identify each on-hold call. In addition, buttons could be th c p0Its on a/V Switching Circuitry 30 for connection to 

provided in this hold display which would allow the user to conference bridges 36. 

send a preprogrammed message to a party on hold. For 45 The preferred embodiment provides two ways for initi- 

example, this message could advise the callee when the call ating a conference call. The first way is to add one or more 

will be resumed, or could state that the call is being parties to m existing two-party call. For this purpose, an 

terminated and will be reinitiated at a later time. ADD button is provided by both the Collaboration Initiator 

Reference is now directed to FIG. 24 which diagrammati- and the Rolodex, as illustrated in FIGS. 2A and 22. To add 

cally illustrates how two-party calls are connected for 50 a new party, a user selects the party to be added (by clicking 

CMWs WS-1 and WS-2, located at the same MLAN 10. As on the user's rolodex name or face icon as described above) 

shown in FIG. 24, CMWsWSl and WS-2 are coupled to the and clicks on the ADD button to invite that new party, 

local A/V Switching Circuitry 30 via ports 81 and 82, Additional parties can be invited in a similar manner. The 

respectively. As previously described, when CMW WS-1 second way to initiate a conference call is to select the 

calls CMW WS-2, a callhandle is created for each port If 55 parties in a similar manner and then click on the CALL 

CMW WS-2 accepts the call, these two callhandles become button (also provided in the Collaboration Initiator and 

active and in response thereto, the AVNM causes the A/V Rolodex windows on the user's CMW screen). 

Switching Circuitry 30 to set up the appropriate connections Another alternative embodiment is to initiate a conference 

between ports 81 and 82, as indicated by the dashed line 83. call from the beginning by clicking on a CONFERENCE/ 

FIG. 25 diagrammatically illustrates how two-party calls 60 MOSAIC icon/button/menu item on the CMW screen. This 

are connected for CMWs WS-1 and WS-2 when located in could initiate a conference call with the call initiator as the 

different MLANs 10a and 10b. As illustrated in FIG. 25, sole participant (Le., causing a conference bridge to be 

CMW WS-1 of MLAN 10a is connected to a port 91a of A/V allocated such that the caller's image also appears on his/her 

Switching Circuitry 30a of MLAN 10a, while CMW WS-2 own screen in a video mosaic, which will also include 

is connected to a port 916 of the audio/video switching 65 images of subsequently added participants). New partici- 

circuit 302> of MLAN 10b. It will be assumed mat MLANs pants could be invited, for example, by selecting each new 

10a and 10b can communicate with each other via ports 92a party's face icon and then clicking on the ADD button. 
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Next to be considered with reference to FIGS. 26 and 27 ence requires that this two-party call (such as illustrated 

is the manner in which conference calls are handled in the between WS-1 and WS-2 in FIG. 24) be rerouted dynami- 

preferred embodiment For the purposes of this description cally so as to, be coupled through conference bridge 36. 

it will be assumed that up to four parties may participate in When the user of WS-1 clicks on the ADD button to add a 

a conference call. Each conference uses four bridge ports 5 new party, (for example WS-5), the Collaboration Initiator 

136-1, 136-2, 136-3 and 136-4 provided on AN Switching of WS-1 sends a redirect request to the AVNM, which 

Circuitry 30a, which are respectively coupled to bidirec- cooperates with the CBM to break the two-party connection 

tional audio/video lines 36-1, 36-2, 36-3 and 36-4 connected 83 in FIG. 24, and then redirect the callhandles created for 

to conference bridge 36. However, from this description it ports 81 and 83 to callhandles created for bridge ports 136-1 

will be apparent how a conference call may be provided for 10 and 136-2, respectively. 

additional parties, as well as simultaneously occurring con- As shown in FIG. 27, this results in producing a connec- 
ference calls. tion w between WS-1 and bridge port 136-1, and a con- 
Once the Collaboration Initiator determines that a con- nection 87 between WS-2 and bridge port 136-2, thereby 
ference is to be initiated, it queries the AVNM for a con- creating a conference set-up between WS-1 and WS-2. 
ference service. If such a service is available, the Collabo- 15 Additional conference participants can then be added as 
ration Initiator requests the associated CBM to allocate a described above for the situations described above in which 
conference bridge. The Collaboration Initiator then places the conference is initiated by the user of WS-1 either 
an audio/video call to the CBM to initiate the conference. selecting multiple participants initially or merely selecting a 
When the CBM accepts the call, the AVNM couples port 101 "conference" and then adding subsequent participants, 
of CMW WS-1 to lines 36-1 of conference bridge 36 by a 20 Having described the preferred manner in which two- 
connection 137 produced in response to callhandles created party calls and conference calls are set up in the preferred 
for port 101 of WS-1 and bridge port 136-1. embodiment, the preferred manner in which data conferenc- 
When the user of WS-1 selects the appropriate face icon ing is provided between CMWs will next be described. 

and clicks the ADD button to invite a new participant to the 

conference, which will be assumed to be CMW WS-3, the 25 DATA CONFERENCING 

C^orata Data conferencing is implemented in the preferred 

CBM. In: response the CTMcaUs^viaWS-Sr^rtm embodiment by certain Snapshot Sharing software provided 

When CBM inmates me c^, the AVNM creates callhancUes ^ me cmw (see FIG. 20). This software permits a "snap- 

for WS-3 port 103 and bridge port 136-2. When WS-3 shot » rf a selec ted portion of a participant's CMW screen 

accepts the rts callhandle is made active," resulting in (such as a window) to be displayed on the CMW screens of 

connection 138 being provided to connectWS-3 and lines other selected participants (whether or not those participants 

™- 2 *™?™ U ™ b "*f *'t™^F^™ S :* h f are also involved in a videoconference). Any numbeTof 
adds CMW WS-5 and to CMW WS-8 » callhandles for s hpts be shared 0 nce displayed, 
their respective ports and bnc^e ports B6-3 and 1364 are participant ^ then telepoint on or annotate the 
created, in turn, as described above for WS-1 and WS-3, snapshot> which ^ted actions and results will appear 
resulting ; in connections 139 and 140 being provided to (vutua lly simultaneously) on the screens of all other par- 
connect WS-5 and WS-9 to conference bridge hues 36-3 and ticipants. The annotation capabilities provided include lines 
36^1 respectively. ITie conferees WS-1, WS-3, WS-5 1 and 0 f several different widths and text of several different sizes. 
^1 «^ US fSS^ t0 COnfat ^ ^ ^ e iA 3< ^' 40 Also, to facilitate participant identification, these annota- 
136-2, 136-3 and 136-4, respectively as shown in FIG. 26. 40 ^ns may be provided in a different color for each partici- 
It will be understood that the video mosaicing circuitry 36 pant Any annotation may also be erased by any participant 
and audio mixing circuitry 38 incorporated in conference FIG. 2B (lower left window) illustrates a CMW screen 
bridge 36 operate as previously described, to form a result- having a shared graph on which participants have drawn and 
ing four-picture mosaic (FIG. 8B) mat is sent to all of the 45 typed to call attention to or supplement specific portions of 
conference participants, which in this example are CMWs the shared image. 

WS-1, WS-2, WS-5 and WS-8. Users may leave a confer- A participant may initiate data conferencing with selected 
ence by just hanging up, which causes the AVNM to delete participants (selected and added as described above for 
the associated callhandles and to send a hangup notification videoconference calls) by clicking on a SHARE button on 
to CBM. When CBM receives the notification, it notifies all 5Q tfa e screen (available in the Rolodex or Collaboration Ini- 
other conference participants that the participant has exited. tiator windows, shown in FIG. 2A, as are CALL and ADD 
In the preferred embodiment, this results in a blackened buttons), followed by selection of the window to be shared, 
portion of that participant's video mosaic image being when a participant clicks on his SHARE button, his Col- 
displayed on the screen of all rernaining participants. laboration Initiator module 161 (FIG. 20) queries the AVNM 

The manner in which the CBM and me conference bridge 55 to locate the Collaboration Initiators of the selected 
36 operate when conference participants are located at participants, resulting in invocation of their respective Snap- 
different sites will be evident from the previously described shot Sharing modules 164. The Snapshot Sharing software 
operation of the cut-and-paste circuitry 39 (FIG. 10) with the modules at the CMWs of each of the selected participants 
video mosaicing circuitry 56 (FIG. 7) and audio mixing query their local operating system 180 to determine avail- 
circuitry 38 (FIG. 9). In such case, each incoming single able graphic formats, and then send this information to the 
video picture or mosaic from another site is connected to a initiating Snapshot Sharing module, which determines the 
respective one of the conference bridge lines 36-1 to 36-4 format that will produce the most advantageous display 
via WAN gateway 40. quality and performance for each selected participant. 

The situation in which a two-party call is converted to a After the snapshot to be shared is displayed on all CMWs, 

conference call will next be considered in connection with 65 each participant may telepoint on or annotate the snapshot, 

FIG. 27 and the previously considered 2-party call illus- which actions andresults are displayed on the CMW screens 

trated in FIG. 24. Converting this 2-party call to a confer- of all participants. This is preferably accomplished by moni- 
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taring the actions made at the CMW (eg. , by tracking mouse Having already described various preferred embodiments 

movements) and sending these "operating system com- and examples of audio/video/data teleconferencing, next to 

mands" to the CMWs of the other participants, rather than be considered are various preferred ways of integrating 

continuously exchanging bitmaps, as would be the case with MMCR, MMM and MMDM with audio/video/data teiecon- 

traditional "remote control" products. j ferencing in accordance with the invention. For mis purpose, 

As illustrated in FIG. 28, the original unchanged snapshot basic preferred approaches and features of each will be 

is stored in a first bitmap 210a. A second bitmap 2Ub stores considered along with preferred associated hardware and 

the combination of the original snapshot and any annota- software, 
tions. Thus, when desired (e.g., by clicking on a CLEAR 

button located in each participant's Share window, as illus- 1Q MULTIMEDIA DOCUMENTS 
trated in FIG. 2B), the origmal unchanged snapshot can be ^ embodiment ^ CTeatl st0 
restored (i.e., erasing aU annotahons) using bitmap 210* ^ ^ J d ^ of documents serve as me 
Selecbveemurescanbeac^^^ dement flnd MMDM 
restoring) the desired erased area of bitmap 210* with the ^dingly, the preferred embodiment advantageously pro- 
corresponding portion from bitaap 210a is vides a uriversa! [format for multimedia dc*uments. This 
Rather than causing a new Share window to be created fomat multimedia documents as a collection of 
whenever a snapshot is shared, it is possible to replace the individual components in multiple media combined with an 
contents of an existing Stare window with a new image. overa]1 structure ^ t^g component that captures the 
This can be achieved in either of two ways. First, the user idell tities, detailed dependencies, references to, andrelation- 
canchckonmeG^buttonandmensdectanwwindow M shi ps among the various other components. The information 
whose contents should replace the contents of the existing ovided b ^ stmcturing comport forms the basis for 
Share window. Second, the user can click on me REGRAB ^ k ^ order of i^uaks, temporal 
button to cause a (presumably modified) version of (he synctaoniz ation, etc., with respect to the composition of a 
original source window to replace the contents of the muUimedia document. FIG. 30 shows the structure of such 
existing Share window. This is particularly useful when one ^ documents as well as their relationship with editing and 
participant desires to share a long document that cannot be storage facilities, 
displayed on me screen in its entirety. For example, the user _ . , it . „ 

might display the first page of a spreadsheet oThis screen, ^ 0 <*>*f>™** f a nndtmedia document uses 

use the SHARE button to share that page, discuss and * s ° wn ^ for "f ™»*f ]» 

, * * •* *t. *. * 1 A ~* t addition, each component may use dedicated storage f acili- 

perhaps annotate it, then return to the spreadsheet appkea- ™ _ ' . y *"T . ,/ , l4 . * 

!• * * *il * *i_ nnrm at» u *♦ * ties. In the preferred embodiment, multimedia documents 

tion to position to the next page, use the REGRAB button to , u, e~ Lfu~4«„ . OT , 

. * j , . . are advantageously structured for authoring, storage, play- 
share me new page, and so on. This mechanism represents , _ . jL. L * • a \: 1 

1 a f- \ _i i- i . back and editing by storing some data under conventional 

a simple, effective step toward application sharing. £1 _ J r * * • • i 

rT^. • , / . . i_ x r j f t.- file systems and some data in special-purpose storage serv- 

Further, instead of sharing a snapshot of data on his m J ^ be ^ x^onventional File System 

c™screen,a^ 35 504 can ^ used to store & non-time-sensitive portions ofa 

^T^^°!! Slyb< t°I aV ^ a ^.^acfaievedvia multimedia document In particular, the following are 

the LOAD button, which causes a dialog box to appear, les of ^.time-sensitive data that can be stored in a 

promptmg the user to select a file. Conversely, via the SAVE conv 7 ntional ^ of me system: 

button, any snapshot may be saved, with all current anno- 4 a ^ , . ^ A ^ 

. . . 1. structured and unstructured text 
tauons. 40 

The capabilities described above were carefully selected Z raster 

to be particularly eflEective in environments where the prin- 3. structured graphics and vector graphics (e.g., 

cipal goal is to share existing information, rather than to PostScript) 

create new information. In particular, user interfaces are 4. references to files in other file systems (video, 
designed to make snapshot capture, telepointing and anno- 45 hi-fidelity audio, etc.) via pointers 
tation extremely easy to use. Nevertheless, it is also to be 5. restricted forms of executables 
understood that, instead of sharing snapshots, a blank 6. structure and timing information for all of the above 
Whiteboard" can also be shared (via the WHITEBOARD (spatial layout, order of presentation, hyperlinks, tern- 
button provided by the Rolodex, Collaboration Initiator, and poral synchronization, etc.) 

active call windows), and that more complex paintbox 50 Of particular importance in multimedia documents is 

capabilities could easily be added for application areas that support for time-sensitive media and media that have syn- 

require such capabilities. colonization requirements with other media components. 

As pointed out previously herein, important features of Some of these time-sensitive media can be stored on con- 
the present invention reside in the manner in which the ventional file systems while others may require special- 
capabilities and advantages of multimedia mail (MMM), 55 purpose storage facilities. 

multimedia conference recording (MMCR), and multimedia Examples of time-sensitive media that can be stored on 
document management (MMDM) are tightly integrated with conventional file systems are small audio files and short or 
audio/video/data teleconferencing to provide a multimedia low-quality video clips (e.g. as might be produced using 
collaboration system that facilitates an unusually higher Quicklime or Video for Windows). Other examples include 
level of communication and collaboration between geo- 60 window event lists as supported by the Window-Event 
graphically dispersed users than has heretofore been achiev- Record and Play system 512 shown in FIG. 30. This 
able by known prior an systems. FIG. 29 is a schematic and component allows for storing and replaying a user's inter- 
diagrammatic view illustrating how multimedia calls/ actions with application programs by capturing the requests 
conferences, MMCR, MMM and MMDM work together to and events exchanged between the client program and the 
provide the above-described features. In the preferred 65 window system in a time-stamped sequence. After this 
embodiment, MM Editing Utilities shown supplementing "record" phase, the resulting information is stored in a 
MMM and MMDM may be identical. conventional file that can later be retrieved and "played" 
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back. During playback the same sequence of window system 
requests and events reoccurs with the same relative timing as 
when they were recorded. In prior-art systems, this capabil- 
ity has been used for creating automated demonstrations. In 
the present invention it can be used, for example, to repro- 
duce annotated snapshots as they occurred at recording 

As described above in connection with collaborative 
workstation software, Snapshot Share 514 shown in FIG. 30 
is a utility used in multimedia calls and conferencing for 
capturing window or screen snapshots, sharing with one or 
more call or conference participants, and permitting group 
annotation, telepointing, and re-grabs. Here, this utility is 
adapted so that its captured images and window events can 
be recorded by the Window-Event Record and Play system 
512 while being used by only one person. By synchronizing 
events associated with a video or audio stream to specific 
frame numbers or time codes, a midtimedia call or confer- 
ence can be recorded and reproduced in its entirety. 
Similarly, the same functionality is preferably used to create 
multimedia mail whose authoring steps are virtually identi- 
cal to participating in a multimedia call or conference 
(though other forms of MMM are not precluded). 

Some time-sensitive media require dedicated storage 
servers in order to satisfy real-time requirements. High- 
quality audio/video segments, for example, require dedi- 
cated real-time audio/video storage servers. A preferred 
embodiment of such a server will be described later. Next to 
be considered is how the current invention guarantees syn- 
chronization between different media components. 

MEDIA SYNCHRONIZATION 

A preferred manner for providing multimedia synchroni- 
zation in the preferred embodiment will next be considered. 
Only multimedia documents with real-time material need 
include synchronization functions and information. Syn- 
chronization for such situations may be provided as 
described below. 

Audio or video segments can exist without being accom- 
panied by the other. If audio and video are recorded simul- 
taneously ("co-recorded"), the preferred embodiment allows 
the case where their streams are recorded and played back 
with automatic synchronization — as would result from con- 
ventional VCRs, laser disks, or time-division multiplexed 
('Interleaved") audio/video streams. This excludes the need 
to tightly synchronize (i.e., "lip-sync") separate audio and 
video sequences. Rather, reliance is on the co-recording 
capability of the Real-Time Audio/Video Storage Server 502 
to deliver all closely synchronized audio and video directly 
at its signal output 

Each recorded video sequence is tagged with time codes 
(e.g. SMFTE at Vjo second intervals) or video frame num- 
bers. Each recorded audio sequence is tagged with time 
codes (e.g., SMFTE or MIDI) or, if co-recorded with video, 
video frame numbers. 

The preferred embodiment also provides synchronization 
between window events and audio and/or video streams. The 
following functions are supported: 

1. Media-time-driven Synchronization: synchronization 
of window events to an audio, video, or audio/video 
stream, using the real-time media as the timing source. 

2. Machme-time-ddveri-Synchronization: 

a. synchronization of window events to the system 
clock 

b. synchronization of the start of an audio, video, or 
audio/video segment to the system clock 

If no audio or video is involved, machine-time-driven 
synchronization is used throughout the document. Whenever 
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audio and/or video is playing, media-time-synchronization 
is used. The system supports transition between machine- 
time and media-time synchronization whenever an audio/ 
video segment is started or stopped. 

As an example, viewing a multimedia document might 
proceed as follows: 

Document starts with an annotated share (rnachine-time- 

driven synchronization). 
Next, start audio only (a "voice annotation'*) as text and 
graphical annotations on the share continue (audio is 
timing source for window events). 
Audio ends, but annotations continue (machine-time- 
driven synchronization). 
Next, start co-recorded audio/video continuing with fur- 
ther annotations on same share (audio is timing source 
for window events). 
Next, start a new share during the continuing audio/video 
recording; annotations happen on both shares (audio is 
timing source for window events). 
Audio/video stops, annotations on both shares continue 

(niachine-time-driven synchronization). 
Document ends. 

AUDIO/VIDEO STORAGE 
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As described above, the present invention can include 
many special-purpose servers that provide storage of time- 
sensitive media (e.g. audio/video streams) and support coor- 
dination with other media. This section describes the pre- 
ferred embodiment for audio/video storage and recording 
services. 

Although storage and recording services could be pro- 
vided at each CMW, it is preferable to employ a centralized 
server 502 coupled to MLAN 10, as illustrated in FIG. 31. 
A centralized server 502, as shown in FIG. 31, provides the 
following advantages: 

1. The total amount of storage hardware required can be 
far less (due to better utilization resulting from statis- 
tical averaging). 

2. Bulky and expensive compression/decompression 
hardware can be pooled on the storage servers and 
shared by multiple clients. As a result, fewer 
compression/decompression engines of higher perfor- 
mance are required than if each workstation were 
equipped with its own compression/decompression 
hardware. 

3. Also, more costly centralized codecs can be used to 
transfer mail wide area among campuses at far lower 
costs that attempting to use data WAN technologies. 

4. File system administration (e.g. backups and file sys- 
tem replication, etc.) are far less costly and higher 
performance. 

The Real-Time Audio/Video Storage Server 502 shown in 
FIG. 31A structures and manages the audio/video riles 
recorded and stored on its storage devices. Storage devices 
may typically include computer-controlled VCRs, as well as 
rewritable magnetic or optical disks. For example, server 
60 502 in HG. 31A includes disks 60e for recording and 
playback. Analog information is transferred between disks 
60e and theA/V Switching Circuitry 30 via analog I/O 62. 
Control is provided by control 64 coupled to Data LAN hub 
25. 

At a high level, the centralized audio/video storage and 
playback server 502 in FIG. 31A performs the following 
functions: 
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File Management; 
It provides mechanisms for creating, naming, time- 
stamping, storing, retrieving, copying, deleting, and 
playing back some or all portions of an audio/video 
file. 

File Transfer and Replication 

The audio/video file server supports replication of files 
on different disks managed by the same file server to 
facilitate simultaneous access to the same files. 
Moreover, file transfer facilities are provided to 
support transmission of audio/video files between 
itself and other audio/video storage and playback 
engines. File transfer can also be achieved by using 
the underlying audio/video network facilities: serv- 
ers establish a real-time audio/video network con- 
nection between themselves so one server can "play 
back" a file while the second server simultaneously 
records it 
Disk Management 

The storage facilities support specific disk allocation, 
garbage collection and ^fragmentation facilities. 
They also support mapping disks with other disks 
(for replication and staging modes, as appropriate) 
and mapping disks, via I/O equipment, with the 
appropriate Video/Audio network port 
Synchronization support 

Synchronization between audio and video is ensured by 
the multiplexing scheme used by the storage media, 
typically by interleaving the audio and video streams 
in a time-division-multiplexed fashion. Further, if 
synchronization is required with other stored media 
(such as window system graphics), then frame 
numbers, time codes, or other timing events are 
generated by the storage server. An advantageous 
way of providing this synchronization in the pre- 
ferred embodiment is to synchronize record and 
playback to received frame number or time code 
events. 
Searching 

To support intra-file searching, at least start, stop, 
pause, fast forward, reverse, and fast reverse opera- 
tions are provided. To support inter-file searching, 
audio/video tagging, or more generalized "go-to" 
operations and mechanisms, such as frame numbers 
or time code, are supported at a search-function 
level 

Connection Management 
The server handles requests for audio/video network 
connections from client programs (such as video 
viewers and editors running on client workstations) 
for real-me recording and real-time playback of 
audio/video files. 
Next to be considered is how centralized audio/video 
storage servers provide for real-time recording and playback 
of video streams. 

Real-Time Disk Delivery 

To support real-time audio/video recording and playback, 
the storage server needs to provide a real-time transmission 
path between the storage medium and the appropriate audio/ 
video network port for each simultaneous client accessing 
the server. For example, if one user is viewing a video file 
at the same time several other people are creating and storing 
new video files on the same disk, multiple simultaneous 
paths to the storage media are required. Similarly, video mail 
sent to large distribution groups, video databases, and simi- 
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lar functions may also require simultaneous access to the 
same video files, again imposing multiple access require- 
ments on the video storage capabilities. 
For storage servers that are based on computer-controlled 

5 VCRs or rewritable laserdisks , a real-time transmission path 
is readily available through the direct analog connection 
between the disk or tape and the network port However, 
because of this single direct connection, each VCR or 
laserdisk can only be accessed by one client program at the 

10 same time (multi-head laserdisks are an exception). 
Therefore, storage servers based on VCRs and laserdisks are 
difficult to scale for multiple access usage. In the preferred 
embodiment, multiple access to the same material is pro- 
vided by file replication and staging, which greatly increases 

15 storage requirements and the need for moving information 
quickly among storage media units serving different users. 

Video systems based on magnetic disks are more readily 
scalable for simultaneous use by multiple people. A gener- 
alized hardware implementation of such a scalable storage 

20 and playback system 502 is illu strated in FIG. 32. Individual 
I/O cards 530 supporting digital and analog I/O are linked by 
intra-chassis digital networking (e.g. buses) for file transfer 
within chassis 532 holding some number of these cards. 
Multiple chassis 532 are linked by inter-chassis networking. 

25 The Digital Video Storage System available from Parallax 
Graphics is an example of such a system implementation. 

The bandwidth available for the transfer of files among 
disks is ultimately limited by the bandwidth of these intra- 
chassis and inter-chassis networking. For systems that use 

30 sufficiently powerful video compression schemes, real-time 
delivery requirements for a small number of users can be 
met by existing file system software (such as the Unix file 
system), provided that the block-size of the storage system 
is optimized for video storage and that sufficient buffering is 

35 provided by the operating system software to guarantee 
continuous flow of the audio/video data. 

Special-purpose software/hardware solutions can be pro- 
vided to guarantee higher performance under heavier usage 
or higher bandwidth conditions. For example, a higher 

40 throughput version of FIG. 32 is illustrated in FIG. 33, 
which uses crosspoint switching, such as provided by SCSI 
Crossbar 540, which increases the total bandwidth of the 
inter-chassis and intra-chassis network, thereby increasing 
the number of possible simultaneous file transfers. 

45 Real-Time Network Delivery 

By using the same audio/video format as used for audio/ 
video teleconferencing, the audio/video storage system can 
leverage the previously described network facilities: the 

30 MLANs 10 can be used to establish a multimedia network 
connection between client workstations and the audio/video 
storage servers. Audio/Video editors and viewers running on 
the client workstation use the same software interfaces as the 
multimedia teleconferencing system to establish these net- 

55 work connections. 

The resulting architecture is shown in FIG. 31B. Client 
workstations use the existing audio/video network to con- 
nect to the storage server's network ports. These network 
ports are connected to compression/decompression engines 

60 that plug into the server bus. These engines compress the 
audio/video streams that come in over the network and store 
them on the local disk. Similarly, for playback, the server 
reads stored video segments from its local disk and routes 
them through the decompression engines back to client 

65 workstations for local display. 

The present invention allows for alternative delivery 
strategies. For example, some compression algorithms are 
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asymmetric, meaning that decompression requires much less 
compute power than compression. In some cases, real-time 
decompression can even be done in software, without requir- 
ing any special-purpose decompression hardware. As a 
result, there is no need to decompress stored audio and video 
on the storage server and play it back in realtime over the 
network. Instead, it can be more efficient to transfer an entire 
audio/video file from the storage server to the client 
workstation, cache it on the workstation's disk, and play it 
back locally. These observations lead to a modified archi- 
tecture as presented in FIG. 3 1C. In this architecture, clients 
interact with the storage server as follows: 
To record video, clients set up real-time audio/video 

network connections to the storage server as before 

(mis connection could make use of an analog line). 
In response to a connection request, the storage server 

allocates a compression module to the new client. 
As soon as the client starts recording, the storage server 

routes the output from the compression hardware to an 

audio/video file allocated on its local storage devices. 
For playback, this audio/video file gets transferred over 

the data network to the client workstation and pre- 

staged on the workstation's local disk. 
The client uses local decompression software and/or 

hardware to play back the audio/video on its local audio 

and video hardware. 
This approach flees up audio/video network ports and 
compression/decompression engines on the server. As a 
result, the server is scaled to support a higher number of 
simultaneous recording sessions, thereby further reducing 
the cost of the system. Note mat such an architecture can be 
part of a preferred embodiment for reasons other than 
compression/decompression asymmetry (such as the eco- 
nomics of the technology of the day, existing embedded base 
in the enterprise, etc.). 

MULTIMEDIA CONFERENCE RECORDING 

Multimedia conference recording (MMCR) will next be 
considered. For full-feature multimedia desktop calls and 
conferencing (e.g. audio/video calls or conferences with 
snapshot share), recording (storage) capabilities are prefer- 
ably provided for audio and video of all parties, and also for 
all shared windows, including any telepointing and annota- 
tions provided during the teleconference. Using the multi- 
media synchronization facilities described above, these 
capabilities are provided in a way such that they can be 
replayed with accurate correspondence in time to the 
recorded audio and video, such as by synchronizing to frame 
numbers or time code events. 

A preferred way of capturing audio and video from calls 
would be to record all calls and conferences as if they were 
multi-party conferences (even for two-party calls), using 
video mosaicing, audio mixing and cut-and-pasting, as pre- 
viously described in connection with FIGS. 7-U. It will be 
appreciated that MMCR as described will advantageously 
permit users at their desktop to review real-time collabora- 
tion as it previously occurred, including during a later 
teleconference. The output of a MMCR session is a multi- 
media document that can be stored, viewed, and edited using 
the multimedia document facilities described earlier. 

FIG. 31D shows how conference recording relates to the 
various system components described earlier. Hie Multime- 
dia Conference Record/Play system 522 provides the user 
with the additional GUIs (graphical user interfaces) and 
other functions required to provide the previously described 
MMCR functionality. 
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The Conference Invoker 518 shown in FIG. 31D is a 
utility that coordinates the audio/video calls that must be 
made to connect the audio/video storage server 502 with 
special recording outputs on conference bridge hardware (35 
5 in FIG. 3). The resulting recording is linked to information 
identifying the conference, a function also performed by this 
utility. 

MULTIMEDIA MAIL 

10 Now considering multimedia mail (MMM), it will be 
understood that MMM adds to the above-described MMCR 
the capability of delivering delayed collaboration, as well as 
the additional ability to review the information multiple 
times and, as described hereinafter, to edit, re-send, and 
archive it The captured information is preferably a superset 
of that captured during MMCR, except that no other user is 
involved and the user is given a chance to review and edit 
before sending the message. 

20 The Multimedia Mail system 524 in FIG. 31D provides 
the user with the additional GUIs and other functions 
required to provide the previously described MMM func- 
tionality. Multimedia Mail relies on a conventional Email 
system 506 shown in FIG. 31D for creating, transporting, 

23 and browsing messages. However, multimedia document 
editors and viewers are used for creating and viewing 
message bodies. Multimedia documents (as described 
above) consist of time-insensitive components and time- 
sensitive components. The Conventional Email system 506 

30 relies on the Conventional File system 504 and Real-Time 
Audio/Video Storage Server 502 for storage support. The 
time-insensitive components are transported within the Con- 
ventional Email system 506, while the real-time components 
may be separately transported through the audio/video net- 

33 work using file transfer utilities associated with the Real- 
Time Audio/Video Storage Server 502. 

MULTIMEDIA DOCUMENT MANAGEMENT 

Multimedia document management (MMDM) provides 
40 long-term, high-volume storage for MMCR and MMM. The 
MMDM system assists in providing the following capabili- 
ties to a CMW user: 

1. Multimedia documents can be authored as mail in the 
MMM system or as call/conference recordings in the 

45 MMCR system and then passed on to the MMDM 
system. 

2. To the degree supported by external compatible mul- 
timedia editing and authoring systems, multimedia 
documents can also be authored by means other than 

50 MMM and MMCR. 

3. Multimedia documents stored within the MMDM sys- 
tem can be reviewed and searched. 

4. Multimedia documents stored within the MMDM sys- 
55 tern can be used as material in the creation of subse- 
quent MMM. 

5. Multimedia documents stored within the MMDM sys- 
tem can be edited to create other multimedia docu- 
ments. 

60 The Multimedia Document Management system 526 in 
FIG. 31D provides the user with the additional GUIs and 
other functions required to provide the previously described 
MMDM functionality. The MMDM includes sophisticated 
searching and editing capabilities in connection with the 

65 MMDM multimedia document such that a user can rapidly 
access desired selected portions of a stored multimedia 
document The Specialized Search system 520 in FIG. 30 
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comprises utilities that allow users to do more sophisticated with their client base. Also among the recipients is an analyst 

searches across and within multimedia documents. This in the research department and his counterpart in London, 

includes context-based and content-based searches The Expert, in preparation for his previously established 

(employing operations such as speech and image "on-calT office hours, consults with others within the cor- 

recognition, information filters, etc.), time-based searches, 5 poration (using the videoconferencing and other collabora- 

and event-based searches (window events, call management tive techniques described above), accesses company records 

events, speech/audio events, etc.). firom ^ CMW* and analyzes such information, employing 

software-assisted analytic techniques. His office hours are 

CLASSES OF COLLABORATION now at hand, so he enters "intercom" mode, which enables 

m incoming calls to appear automatically (without requiring 

The resulting multimedia collaboration environment 10 me ^peti to "answer his phone" and elect to acceptor reject 

achieved by the above-described integration of audio/video/ me 

data teleco^erencing, MMCR, MMM and MMDM is illus- ^ computer becps indicating an incoming 

trated xn HG. 34. It will be evident that each user can call, and the image of a field representative 201 and his client 

elaborate with other users in real-time despite separations 202 who are located at a bank branch somewhere in the U.S. 

mspaceandti^ appears in video window 203 of the Expert's screen (shown 

information already available within their computing and ifl nG 35) . Note ^ unless me call is converted to a 

information . systems, including "^rmation captured from "conference" call (whether explicitly via a menu selection or 

prenouscoUabc*ation^ b callillg Wo OT mare other parti cipants or 

separations are supported in the following ways: ^ adding a ^ participant t0 a call) , me callers will see only 

1. Same time, different place each other in the video window and will not see themselves 
Multimedia calls and conferences as part of a video mosaic. 

2. Different time, same place Also illustrated on the Expert's screen in FIG. 35 is the 
MMDM access to stored MMCR and MMM Collaboration Initiator window 204 from which the Expert 

information, or use of MMM directly (ie., copying 25 can (utilizing Collaboration Initiator software module 161 

mail to oneself) shown in FIG. 20) initiate and control various collaborative 

3. Different time, different place sessions. For example, the user can initiate with a selected 
MMM participant a video call (CALL button) or the addition of that 

4. Same time, same place selected participant to an existing video call (ADD button), 
Collaborative, face-to-face, multimedia document ere- 30 as well as a share session (SHARE button) using a selected 

atiou window or region on the screen (or a blank region via the 

By use of the same user interfaces a network functions, WHITEBOARD button for subsequent annotation). The 

the present invention smoothly spans these three venus. user can ^ mvoke ^ MAIL software (MAIL button) and 

prepare outgoing or check incoming Email messages (the 

REMOTE ACCESS TO EXPERTISE 35 presence of which is indicated by a picture of an envelope 

- i_ A . . in foe dog's mouth in In Box icon 205), as well as check for 

In order to illustrate how the present invention may be „ Tcallcd „ m s fromother ^ (MESSAGES button) 

mmlemented and operated, an exemplary Referred embodi- left ^ ^ leave WORD button in video window 203. 

meat wm be described having features ^applicable to the video wmdow 203 also contams buttons from wWch niany 

aforemenfcon^sc^^ of mcse ^ fcatures ^ fce 

tise. It is to be understood that this exemplary e^diment as hangill u a video call (HANGUP button), putting a call 

is merely lUustrative, and is not to be considered as limiting on hold (jjQLD button), resuming a call previously put on 

u* mv f. nt1011 ' . invent J on mfl y be hold(RESUMEbutton)ormutJngtheaudioportionof acall 

adapted for other applications (such as in engmeenng and (MUTE button). In addition, the user can invoke the record- 

manufactunng) or uses having more or less hardware, soft- m of a c0nf 4 nce 5 ^ confere nce RECORD button, 

ware and operating features and combined in various ways. ^ on ^ &pert , s screen is a ^sktop 

Consider the following scenario involving access from window 206 containing icons from which other programs 

remote sites to an in-house corporate "expert" in the trading (whether or not part of this invention) can be launched, 

of financial instruments such as in the securities market: Returning to the example, the Expert is now engaged in 

The focus of the scenario revolves around the activities of 50 a videoconference with field representative 201 and his 

a trader who is a specialist in securities. The setting is the client 202. In the course of this videoconference, as illus- 

start of his day at his desk in a major financial center (NYC) trated in FIG. 36, the field representative shares with the 

at a major U.S. investment bank Expert a graphical image 210 (pie chart of client portfolio 

The Expert has been actively watching a particular secu- holdings) of his client's portfolio holdings (by clicking on 

rity over the past week and upon his arrival into the office, 55 his SHARE button, corresponding to the SHARE button in 

he notices it is on the rise. Before going home last night, he video window 203 of the Expert's screen, and selecting that 

previously set up his system to filter overnight news on a image from his screen, resulting in the shared image appear- 

particular family of securities and a security within that ing in the Share window 211 of the screen of all participants 

family. He scans the filtered news and sees a story that may to the share) and begins to discuss the client's investment 

have a long-term impact on this security in question. He 60 dilemma. Hie field representative also invokes a command 

believes he needs to act now in order to get a good price on to secretly bring up the client profile on the Expert's screen, 

the security. Also, through filtered mail, he sees that his After considering this information, reviewing the shared 

counterpart in London, who has also been watching this portfolio and asking clarifying questions, the Expert illus- 

security, is interested in getting our Expert's opinion once he trates his advice by creating (using his own modeling 

arrives at work. 65 software) and sharing a new graphical image 220 (FIG. 37) 

The Expert issues a multimedia mail message on the with the field representative and his client Either party to the 

security to the head of sales worldwide for use in working share can annotate that image using the drawing tools 221 
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(and the TEXT button, which permits typed characters to be . As illustrated in FIG. 39, video window 203 is now 

displayed) provided within Share window 211, ox "regrab" replaced with a four-person video mosaic representing a 

a modified version of the original image (by using the four-party conference call consisting of the Expert 233, his 

REGRAB button), or remove all such annotations (by using boss 241 and the two members 242 and 243 of the bank's 

the CLEAR button of Share window 211), or "grab" anew 5 operations department The Expert quickly answers the 

image to share (by clicking on the GRAB button of Share D0SS » ques tion and, by clicking on the RESUME button (of 

window 211 and selecting mat new image from the screen). video window 203) adjacent to the names of the other 

In addition, any participant to a shared session can add a new parti cipants to the call on hold, simultaneously hangs up on 

P '^SF?ft f*T g tt P T aP f X T the 1 s i olo ^ x , or the conference call with his boss and resumes his three-party 
quick-dial list (as described above for video calls and for conference can mvo ivine the securities issue as mustrated 
data conferencing) and clicking the ADD button of Share 10 . ^ ence . C T 1°^** securities issue, as illustrated 
window 211. One can also save the shared image (SAVE 10 vldeo wm * w 203 of na 40 ' 
button), load a previously saved image to be shared (LOAD White that call was on hold, however, analyst 231 and 
button), or print an image (PRINT button). London expert 232 were still engaged in a two-way video- 
While discussing the Experts advice, field representative conference (with a blackened portion of the video mosaic on 
201 makes annotations 222 to image 220 in order to Illus- 13 their screens indicating that the Expert was on hold) and had 
trate his concerns. While responding to the concerns of field shared and annotated a graphical image 250 (see annotations 
representative 201, the Expert hears a beep and receives a 251 to image 250 of FIG. 40) illustrating certain financial 
visual notice (New Call window 223) on his screen (not concerns. Once the Expert resumed the call, analyst 231 
visible to the field representative and his client), indicating added the Expert to the share session, causing Share window 
the existence of a new incoming call and identifying the 20 211 containing annotated image 250 to appear on the 
caller. At this point, the Expert can accept the new call Expert's screen. Optionally, snapshot sharing could progress 
(ACCEPT button), refuse the new call (REFUSE button, wn jle the video was on hold. 

which will result in a message being displayed on the Brf concluding ^ con ference regarding the securities, 
caller's screen indicating that the Expert is unavailable) or ^ ^ • ^ ^ * • • \1 I- 

add the new caller to the Expert's^xisting call (ADD ^ the Expert receives notification of an incommg multimedia 

button). In this case, the Expert elects yet another option (not "f 1 niessage-^g., a ^ accompanied by the appearance 

shown)-to defer the call and leave the caller a standard of 311 e ? velo P? » s raouth m 111 Box 1C0n . ™ 

message that the Expert will call back in X minutes (in this shown m HG - 40 - 0ncc he concludes his call, he quickly 

case, 1 minute). The Expert then elects also to defer his scans to incommgnniltimedlamail message by clicking on 

existing call, telling the field representative and his client 30 10 Box icon 205 » wbkh invokes his mail software, and then 

that he will call them back in 5 minutes, and then elects to selecting the incoming message for a quick scan, as gener- 
return the initial deferred calL illustrated in the top two windows of FIG. 2B. He 

It should be noted that the Expert's act of deferring a call dec | des ^ can wait for further review as the sender is an 
results not only in a message being sent to the caller, but also othet **** one ^P"* on 015 secunt y <l uestl0n - 

in the caller's name (and perhaps other information associ- 35 He men reinitiates (by selecting deferred call indicator 

ated with the call, such as the time the call was deferred or 2 ^°> shown in FIG. 40) his deferred call with field repre- 

is to be resumed) being displayed in a list 230 (see FIG. 38) sentative 201 and his client 202, as shown in FIG. 41. Note 

on the Expert's screen from which the call can be reinitiated mat ^ state of the call is also recreated, including 

Moreover, the "state" of the call (e.g., the information being restoration of previously shared image 220 with annotations 

shared) is retained so that it can be recreated when the call 40 222 M ^ ex *sted when the call was deferred (see FIG. 37). 

is reinitiated Unlike a "hold" (described above), deferring a Note dso in FIG. 41 that, having reviewed his only unread 

call actually breaks the logical and physical connections, incoming multimedia mail message, In Box icon 205 no 

requiring that the entire call be reinitiated by the Collabo- longer shows an envelope in the dog's mouth, indicating that 

ration Initiator and the AVNM as described above. the Expert currently has no unread incoming messages. 

Upon returning to the initial deferred call, the Expert 45 As the Expert continues to provide advice and pricing 

engages in a videoconference with caller 231, a research information to field representative 201, he receives notifi- 

analyst who is located 10 floors up from the Expert with a cation of three priority calls 261-263 in short succession, 

complex question regarding a particular security. Caller 231 Call 261 is the Head of Sales for the Chicago office. Working 

decides to add London expert 232 to the videoconference at home, she had instructed her GMW to alert her of all 

(via the ADD button in Collaboration Initiator window 204) 50 urgent news or messages, and was subsequently alerted to 

to provide additional information regarding the factual his- mc arrival of the Expert's earlier multimedia mail message, 

tory of the security. Upon selecting the ADD button, video Call 262 is an urgent international call. Call 263 is from the 

window 203 now displays, as illustrated in FIG. 38, a video Head of Sales in Los Angeles. The Expert quickly winds 

mosaic consisting of three smaller images (instead of a down and then concludes his call with field representative 

single large image displaying only caller 231) of the Expert 55 201* 

233, caller 231 and London expert 232. The Expert notes from call indicator 262 that this call is 

During this videoconference, an urgent PRIORITY not only an international call (shown in the top portion of the 

request (New Call window 234) is received from the New Call window), but he realizes it is from a laptop user 

Expert's boss (who is engaged in a three-party videocon- in the field in Central Mexico. The Expert elects to prioritize 

ference call with two members of the bank's operations 60 ms calls in the following manner 262, 261 and 263. He 

department and is attempting to add the Expert to that call therefore quickly answers call 261 (by clicking on its 

to answer a quick question). The Expert puts his three-party ACCEPT button) and puts that call on hold while deferring 

videoconference on hold (merely by clicking the HOLD call 263 in the manner discussed above. He then proceeds to 

button in video window 203) and accepts (via the ACCEPT accept the call identified by international call indicator 262. 
button of New Call window 234) the urgent call from his 65 Note in FIG. 42 deferred call indicator 271 and the 

boss, which results in the Expert being added to the boss' indicator for the call placed on hold (next to the highlighted 

three-party videoconference call. RESUME button in video window 203), as well as the image 
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of caller 272 from the laptop in the field in Central Mexico. by adding the call on hold to his existing call. As noted 

Although Mexican caller 272 is outdoors and has no direct above, both the Expert and the previously held caller will 

access to any wired telephone connection, his laptop has two have full video capabilities vis-a-vis one another and will 

wireless modems permitting dial-up access to two data see a three-way mosaic image (with the image of caller 272 

connections in the nearest field office (through which his 5 at a slower frame rate), whereas caller 272 will have access 

calls were routed). The system automatically (based upon only to the audio portion of this three-way conference call, 

the laptop's registered service capabilities) allocated one though he will have data conferencing functionality with 

connection for an analog telephone voice call (using his both of the other participants. 

laptop 1 s built-in microphone and speaker and the Expert's The Expert forwards the multimedia mail message to both 

computer-integrated telephony capabilities) to provide audio 10 caller 272 and the other participant, and all three of them 

teleconferencing. The other connection provides control, review the video enclosure in greater detail and discuss the 

data conferencing and one-way digital video (i.e., the laptop concern raised by caller 272. They share certain relevant 

user cannot see the image of the Expert) from the laptop's data as described above and realize that they need to ask a 

built-in camera, albeit at a very slow frame rate (e.g., 3-10 quick question of another remote expert They add that 

small frames per second) due to the relatively slow dial-up ^ expert to the call (resulting in the addition of a fourth image 

phone connection. to the video mosaic, also not shown) for less than a minute 

It is important to note that, despite the limited capabilities while they obtain a quick answer to their question. They then 
of the wireless laptop equipment, the present invention continue their three-way call until the Expert provides his 
accommodates such capabilities, supplementing an audio advice and then adjourns the call, 
telephone connection with limited (i.e., relatively slow) 2Q The Expert composes a new miiltimedia mail message, 
one-way video and data conferencing functionality. As tele- recording his image and audio synchronized (as described 
phony and video compression technologies improve, the above) to the screen displays resulting from his simulta- 
present invention will accommodate such improvements neous interaction with his CMW (e.g., running a program 
automatically. Moreover, even with one participant to a that performs certain calculations and displays a graph while 
teleconference having limited capabilities, other participants 2$ the Expert illustrates certain points by telepointing on the 
need not be reduced to this "lowest common denominator.** screen, during which time his image and spoken words are 
For example, additional participants could be added to the also captured). He sends this message to a number of 
call illustrated in FIG. 42 as described above, and such salesfarce recipients whose identities are determined auto- 
participants could have full videoconferencing, data confer- matically by an outgoing mail filter that utilizes a database 
encing and other collaborative functionality vis-a-vis one ^ of information on each potential recipient (e.g., selecting 
another, while having limited functionality only with caller only those whose clients have investment policies which 
272. allow this type of investment). 

As his day evolved, the off-site salesperson 272 in Mexico The Expert then receives an audio and visual reminder 

was notified by his manager through the laptop about a new (not shown) that a particular video feed (e.g., a short 

security and became convinced that his client would have 35 segment of a financial cable television show featuring new 

particular interest in this issue. The salesperson therefore financial instruments) will be triggered automatically in a 

decided to contact the Expert as shown in PEG. 42. While few minutes. He uses this rime to search his local securities 

discussing the security issues, the Expert again shares all database, which is dynamically updated from financial infor- 

captured graphs, charts, etc. mation feeds (e.g., prepared from a broadcast textual stream 

The salesperson 272 also needs the Expert's help on 40 of current financial events with indexed headers that auto- 

another issue. He has hard copy only of a client's portfolio matically applies data filters to select incoming events 

and needs some advice on its composition before he meets relating to certain securities). The video feed is then dis- 

with the client tomorrow. He says he will fax it to the Expert played on the Expert's screen and he watches this short 

for analysis. Upon receiving the fax-on his CMW, via video segment. 

computer-integrated fax-the Expert asks if he should either 45 After analyzing this extremely up-to-date information, the 

send the Mexican caller a "QuickTime** movie (a lower Expert then reinitiates his previously deferred call, from 

quality compressed video standard from Apple Computer) indicator 271 shown in FIG. 42, which he knows is from the 

on his laptop tonight or send a higher-quality CD via FedX Head of Sales in Los Angeles, who is seeking to provide his 

tomorrow — the notion being that (he Expert can produce an prime clients with securities advice on another securities 

actual video presentation with models and annotations in 50 transaction based upon the most recent available informa- 

video form. The salesperson can then play it to his client tion. The Expert's call is not answered directly, though he 

tomorrow afternoon and it will be as if the Expert is in the receives a short prerecorded video message (left by the 

room. The Mexican caller decides he would prefer the CD. caller who had to leave his home for a meeting across town 

Continuing with this scenario, the Expert learns, in the soon after his priority message was deferred) asking that the 

course of his call with remote laptop caller 272, that he 5s Expert leave him a multimedia mail reply message with 

missed an important issue during his previous quick scan of advice for a particular client, and explaining that he will 

his incoming multimedia mall message. The Expert is upset access this message remotely from bis laptop as soon as his 

that the sender of the message did not utilize the 'Video meeting is concluded. The Expert complies with this request 

highlight** feature to highlight this aspect of the message, and composes and sends this mail message. 

This feature permits the composer of the message to define 60 The Expert then receives an audio and visual reminder on 

"tags" (e.g., by clicking a TAG button, not shown) during his screen indicating that his office hours will end in two 

record time which are stored with the message along with a minutes. He switches from "intercom" mode to **telephone" 

"time stamp," and which cause a predefined or selectable mode so that he will no longer be disturbed without an 

audio and/or visual indicator to be played/displayed at that opportunity to reject incoming calls via the New Call 

precise point in the message during playback. 65 window described above. He then receives and accepts a 

Because this issue relates to the caller that the Expert has final call concerning an issue from an electronic meeting 

on hold, the Expert decides to merge the two calls together several months ago, which was recorded in its entirety. 
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The Expert accesses this recorded meeting from his 
"corporate memory". He searches the recorded meeting 
(which appears in a second video window on his screen as 
would a live meeting, along with standard controls for 
stop/play/rewind/fast forward/etc.) for an event that will 
trigger his memory using his fast forward controls, but 
cannot locate the desired portion of the meeting. He then 
elects to search the ASCII text log (which was automatically 
extracted in the background after the meeting had been 
recorded, using the latest voice recognition techniques), but 
still cannot locate the desired portion of the meeting. Finally, 
he applies an information filter to perform a content-oriented 
(rather than literal) search and finds the portion of the 
meeting he was seeking. After quickly reviewing this short 
portion of the previously recorded meeting, the Expert 
responds to the caller's question, adjourns the call and 
concludes his office hours. 

It should be noted that the above scenario involves many 
state-of-the-art desktop tools (e.g., video and information 
feeds, information filtering and voice recognition) mat can 
be leveraged by our Expert during videoconferencing, data 
conferencing and other collaborative activities provided by 
the present invention — because this invention, instead of 
providing a dedicated videoconferencing system, provides a 
desktop multimedia collaboration system that integrates into 
the Expert's existing workstation/LAN/WAN environment 

It should also be noted that all of the preceding collabo- 
rative activities in this scenario took place during a relatively 
short portion of the expert's day (e.g., less than an hour of 
cumulative time) while the Expert remained in his office and 
continued to utilize the tools and information available from 
his desktop. Prior to this invention, such a scenario would 
not have been possible because many of these activities 
could have taken place only with face-to-face collaboration, 
which in many circumstances is not feasible or economical 
and which thus may well have resulted in a loss of the 
associated business opportunities. 

Although the present invention has been described in 
connection with particular preferred embodiments and 
examples, it is to be understood that many modifications and 
variations can be made in hardware, software, operation, 
uses, protocols and data formats without departing from the 
scope to which the inventions disclosed herein are entitled. 
For example, for certain applications, it will be useful to 
provide some or all of the audio/video signals in digital 
form. Accordingly, the present invention is to be considered 
as including all apparatus and methods encompassed by the 
appended claims. 

We claim: 

1. A teleconferencing system for conducting a teleconfer- 
ence among a plurality of participants comprising: 

(a) first, second and third locations; 

(b) at least one workstation at each of the first, second and 
third locations, each workstation including audio and 
video capture and reproduction capabilities arranged to 
capture and reproduce participant video images and 
spoken audio; 

(c) a codec associated with each of the first, second and 
third locations; 

(d) an AV path linking the three locations; 

(e) a network switch associated with at least one of the 
first and second locations and designed to route codec 
compressed AV signals representing captured partici- 
pant video images and spoken audio along the AV path; 

(f) a switch in communication with the third location to 
route compressed AV signals, destined for the second 
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location, from the first location to the second location 
without the compressed AV signals being decom- 
pressed or reproduced at said third location; and 

(g) a Local Area Network switch functionally positioned 
5 between the workstation at and the codec associated 
with one of the first and second locations to route 
signals to the workstation at that location, 

wherein the location at which the Local Area Network 
switch is positioned is a multi-participant location 
10 having a plurality of workstations, and 

wherein the Local Area Network switch is arranged to 
receive signals from the codec at that location and route 
the received signals to a destination workstation of the 
workstation plurality. 

2. The teleconferencing system of claim 1, wherein said 
AV path includes dedicated links between said locations. 

3. The teleconferencing system of claim 1, wherein said 
AV path includes dial-up connections between said loca- 
tions. 

20 

4. The teleconferencing system of claim 1, wherein said 
AV path includes dial-up connections and dedicated links 
between said locations. 

5. The teleconferencing system of claim 4, wherein said 
AV path includes a dial-up connection between at least two 
locations and a dedicated link between at least two locations. 

6. The teleconferencing system of claim 1, wherein the 
system is configured to optimize the routing of AV signals 
between said locations. 

3Q 7. The teleconferencing system of claim 6, wherein the 
signal routing is optimized based on either the actual or the 
anticipated state of said AV path. 

8. The teleconferencing system of claim 1, wherein said 
AV path includes at least one trunk associated with at least 
one codec 

35 9. The teleconferencing system of claim 1, wherein 

(a) the number of workstations at each multi-participant 
location is greater than the number of any one resource 
in the group consisting of codecs and network switches 

^ associated with that location, and 

(b) each of the workstations has access to any of the 
resources. 

10. The teleconferencing system of claim 1, further com- 
prising a directory containing participant location informa- 

45 tion and wherein the compressed AV signals are routed using 
participant information in the directory. 

11. The teleconferencing system of claim 1, further com- 
prising: 

(a) a participant locator that responds to a participant 
50 logging into a workstation by associating that partici- 
pant with each such workstation logged into, thereby 
enabling the routing of a videoconf erence call, for that 
participant, to the workstation at which mat participant 
is logged in. 

55 12. The teleconferencing system of claim 1, wherein the 
video image and spoken audio of a first participant at the first 
location, routed to said second location via said third 
location, can be reproduced at the workstations of both said 
first participant and a second participant at the second 

50 location. 

13. The teleconferencing system of claim 12, further 
comprising: 

(a) a data network providing a data path for carrying data 
signals among the workstations; and 
65 (b) a data conference manager configured to manage a 
data conference during which shared data is displayed 
at the workstations of a plurality of participants. 
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14. Hie teleconferencing system of claim 13, wherein the 
AV and data paths define physically separate paths. 

15. The teleconferencing system of claim 13, further 
comprising a video mosaic generator configured to combine 
at least a portion of captured video images of said first and 
second participants to generate a mosaic image for repro- 
duction at least one workstation. 

16. The teleconferencing system of claim 15, further 
comprising a close-up selector for selecting the image of one 
of the participants in said mosaic image and replacing said 
mosaic image with the image of said selected image. 

17. The teleconferencing system of claim 15, wherein the 
mosaic image includes images 1 of the first and second and 
a third participant, the system further comprising an audio 
summer configured to receive the captured audio of said 
first, second and third participants and combining only the 
received audio of the second and third participants into an 
audio sum for reproduction at the workstation of said first 
participant 

18. The teleconferencing system of claim 17, wherein the 
AV reproduction capabilities of at least the workstation of 
the first participant includes a plurality of speakers, the 
system further comprising: 

(a) an audio control configured to control the reproduction 
of said audio sum at said first participant's workstation 
such that the composition of the audio originating from 
each or the second and third participants reproduced at 
each speaker is dependent on a position of the images 
of the second and third participant in said reproduced 
mosaic image. 

19. The teleconferencing system of claim 17, further 
comprising an echo canceller to reduce echo during the 
reproduction of said audio sum. 

20. The teleconferencing system of claim 12, further 
comprising a video mosaic generator configured to combine 
the captured video images of at least said first and second 
participants into a mosaic image for reproduction at at least 
one workstation. 

21. The teleconferencing system of claim 20, further 
comprising a distributed video mosaic generator configured 
to combine a portion of said mosaic image with a captured 
image of a third participant to generate a distributed mosaic 
image of the captured images of said three participants for 
reproduction at the workstation of at least one of the three 
participants. 

22. The teleconferencing system of claim 20, wherein the 
mosaic image includes images of the first and second and a 
third participant the system further comprising an audio 
summer configured to receive the captured audio of said 
first, second and third participants and combining only the 
received audio of the second and third participants into an 
audio sum for reproduction at the workstation of said first 
participant 

23. Hie teleconferencing system of claim 22, 
wherein the audio summer is configured to combine a part 

of the audio sum with the captured audio of another 
participant to generate a composite audio sum for 
reproduction at the workstation of at least one partici- 
pant 

24. The teleconferencing system of claim 22, wherein the 
AV reproduction capabilities of at least the workstation of 
the first participant includes a plurality of speakers the 
system further comprising: 

(a) an audio control configured to control the reproduction 
of said audio sum at said first participant' s workstation 
such that the composition of the audio originating from 
each of the second and third participants reproduced at 
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each speaker is dependent on a position of the images 
of the second and third participants in said reproduced 
mosaic image. 

25. The teleconferencing system of claim 1, wherein 
s video images are reproduced at a workstation as full-motion 

video. 

26. The teleconferencing system of claim 25, wherein the 
full-motion video images are reproduced at a workstation at 
a rate of at least about 30 frames per second. 

10 27. The teleconferencing system of claim 20, further 
comprising a close-up selector for selecting the image of one 
of the participants in said mosaic image and replacing said 
mosaic image with the image of said selected image. 

28. A method of conducting a teleconference among at 
15 least one participant at each of first and second locations, 

each location having at least one associated workstation with 
audio and video capture and reproduction capabilities and a 
codec and one of of the locations having a Local Area 
Network switch, the method comprising the steps 
2Q (a) linking the first and second locations with a third 
location including at least one workstation with audio 
and video reproduction capabilities and at least one 
codec; 

(b) capturing video images and spoken audio of a par- 
25 ticipant at the first location; 

(c) compressing signals representing the captured video 
and audio at the first location; 

(d) routing compressed signals, destined for the second 
location, to the third location; 

30 (e) receiving the routed signals at the third location and 
routing the received signals from the third to the second 
location without decompressing the received signals or 
reproducing the audio or video represented by the 
received signals at the workstation at the third location; 

35 (f) receiving the signals from the third location and 
reproducing the audio and video, captured at the fast 
location, at the second location; and 
(g) using the Local Area Network switch to route signals 
representing captured participant audio and video 

40 between the workstation and the codec at the location 
having the switch. 

29. The method of claim 28, further comprising the steps 
of: 

45 (a) routing the compressed signals to optimize their 
transfer between the locations. 

30. The method of claim 29, wherein the optimization is 
based on either the actual or the anticipated state of the links 
between the locations. 

5Q 31. The method of conducting a teleconference of claim 
28, further comprising the steps of: 
(a) managing a data conference during which data is 
shared among a plurality of participants and displayed 
at associated workstations; and 
55 (b) managing a videoconference, during which the video 
image and spoken audio of one participant are repro- 
duced at the workstation associated with another par- 
ticipant 

32. The method of conducting a teleconference of claim 
go 31, further comprising the steps of: 

(a) combining at least a portion of the captured images of 
a first and a second participant into a mosaic image; and 

(b) reproducing the mosaic image at a workstation. 

33. The method of conducting a teleconference of claim 
65 32 further comprising the steps of: 

(a) selecting the image of one of the participants in the 
mosaic image; and 
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(b) replacing the mosaic image with the image of the 
selected image. 

34. The method of conducting a teleconference of claim 
28, further comprising the steps of: 

(a) combining the captured images of a first and second 
participant into a mosaic image; and 

(b) reproducing the mosaic image at a workstation. 

35. The method of conducting a teleconference of claim 
34, further comprising the steps of: 

(a) combining a portion of the mosaic image with a 
captured image of another of the participants to gen- 
erate a composite mosaic image; and 

(b) reproducing the composite mosaic image at me work- 
station of at least one of the participants. 

36. The method of conducting a teleconference of claim 
34, further comprising the steps of: 

(a) receiving captured audio of the first and second 
participants and a third participant; 

(b) combining the received audio of only the second and 
third participants into an audio sum; and 

(c) reproducing the audio sum at the workstation associ- 
ated with the first participant 

37. The method of conducting a teleconference of claim 
36, wherein the reproduced mosaic image includes images 
of the first, second and third participants, the method further 
comprising the step of: 

(a) reproducing the audio sum at the first participant's 
workstation such that the composition of the repro- 
duced audio is dependent on a position of the images of 
participants in the reproduced mosaic image. 

38. The method of conducting a teleconference of claim 
34, further comprising the steps of: 

(a) selecting the image of one of the participants in the 
mosaic image; and 
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(b) replacing the mosaic image with the image of the 
selected image. 

39. The method of conducting a teleconference of claim 
28, further comprising the steps of: 

(a) receiving the captured audio of a first, second and third 
participant; 

(b) combining the received audio of only the second and 
third participants into an audio sum; and 

10 (c) reproducing the audio sum at the workstation of first 
participant. 

40. The method of conducting a teleconference of claim 
39, wherein the reproduced mosaic image includes images 

15 of the first, second and third participants, the method further 
comprising the step of: 
(a) reproducing the audio sum at the first participant's 
workstation such that the composition of the repro- 
duced audio is dependent on a position of the images of 
20 participants in the reproduced mosaic image. 

41. The method of conducting a teleconference of claim 
28, further comprising the steps of: 

(a) associating a participant with each workstation logged 
25 into by the participant; and 

(b) muting a call to initiate a videoconference with such 
participant to the workstation at which that participant 
is logged in. 

3Q 42. The method of claim 28, wherein video images are 
reproduced at a workstation as full-motion video. 

43. The method of claim 42, wherein the full-motion 
video images are reproduced at a workstation at a rate of at 
least about 30 frames per second. 

***** 
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